LLM Reference

FLAN-T5 Models by Google DeepMind

5 models2022Up to 512 ctxFrom $0.6/1M input

About

The FLAN-T5 family of large language models is a set of enhanced versions of the original T5 (Text-to-Text Transfer Transformer) models, introduced in the paper "Scaling Instruction-Finetuned Language Models" 489. These models incorporate improvements from T5 version 1.1 and have undergone instruction finetuning on a diverse mixture of over 1,000 tasks across multiple languages 2)3. The extensive fine-tuning enhances their zero-shot and few-shot performance, making them versatile for various natural language processing tasks 489. Google offers several FLAN-T5 variants, such as small, base, large, XL, and XXL, each varying in size and computational needs 489. They are accessible through the Hugging Face Transformers library, facilitating their application in numerous contexts 489. However, they were trained on data without filtering for explicit content or bias assessment, which may result in the generation of inappropriate content or the perpetuation of existing biases 1.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

5 in view

Use when the workload needs 512 context and 11B parameters.

2022-10512 context11B parameters
Flan-T5 XLCurrent

Use when the workload needs 512 context and 3B parameters.

2022-10512 context3B parameters

Use when the workload needs 512 context and 780M parameters.

2022-10512 context780M parameters

Use when the workload needs 512 context and 80M parameters.

2022-10512 context80M parameters

Use when the workload needs 512 context and 250M parameters.

2022-10512 context250M parameters

Release Timeline

1 release group
2022-10
5 current
Flan-T5 Base
512 context250M parameters
Current
Flan-T5 Large
512 context780M parameters
Current
Flan-T5 Small
512 context80M parameters
Current
Flan-T5 XL
512 context3B parameters
Current
Flan-T5 XXL
512 context11B parameters
Current

Specifications(5 models)

FLAN-T5 model specifications comparison
ModelReleasedContextParameters
Flan-T5 XXL2022-1051211B
Flan-T5 XL2022-105123B
Flan-T5 Large2022-10512780M
Flan-T5 Small2022-1051280M
Flan-T5 Base2022-10512250M

Available From(2 providers)

Pricing

FLAN-T5 model pricing by provider
ModelProviderInput / 1MOutput / 1MType
Flan-T5 XLIBM watsonx$0.6$0.6Serverless
Flan-T5 XXLIBM watsonx$1.8$1.8Serverless

Frequently Asked Questions

What is FLAN-T5 used for?
The FLAN-T5 family of large language models is a set of enhanced versions of the original T5 (Text-to-Text Transfer Transformer) models, introduced in the paper "Scaling Instruction-Finetuned Language Models" 489.
How does FLAN-T5 compare to Gemma 4?
FLAN-T5 by Google DeepMind is strongest where you need its listed use cases, while Gemma 4 by Google DeepMind is the closest related family to check for multimodal. FLAN-T5 has 5 listed variants and reaches up to 512 context, while Gemma 4 reaches up to 256k context, so compare the specs and pricing tables before choosing a production model.
Which FLAN-T5 model should I use?
For the lowest listed input price, start with Flan-T5 XL through IBM watsonx at $0.6/1M input tokens. For the most capable/latest local choice, evaluate Flan-T5 XXL with 512 context.

Models(5)