FLAN-T5 Models by Google DeepMind
About
The FLAN-T5 family of large language models is a set of enhanced versions of the original T5 (Text-to-Text Transfer Transformer) models, introduced in the paper "Scaling Instruction-Finetuned Language Models" 489. These models incorporate improvements from T5 version 1.1 and have undergone instruction finetuning on a diverse mixture of over 1,000 tasks across multiple languages 2)3. The extensive fine-tuning enhances their zero-shot and few-shot performance, making them versatile for various natural language processing tasks 489. Google offers several FLAN-T5 variants, such as small, base, large, XL, and XXL, each varying in size and computational needs 489. They are accessible through the Hugging Face Transformers library, facilitating their application in numerous contexts 489. However, they were trained on data without filtering for explicit content or bias assessment, which may result in the generation of inappropriate content or the perpetuation of existing biases 1.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 512 context and 11B parameters.
Use when the workload needs 512 context and 3B parameters.
Use when the workload needs 512 context and 780M parameters.
Use when the workload needs 512 context and 80M parameters.
Use when the workload needs 512 context and 250M parameters.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Flan-T5 XXL | Use when the workload needs 512 context and 11B parameters. | 2022-10 | 512 context11B parameters | Current |
| Flan-T5 XL | Use when the workload needs 512 context and 3B parameters. | 2022-10 | 512 context3B parameters | Current |
| Flan-T5 Large | Use when the workload needs 512 context and 780M parameters. | 2022-10 | 512 context780M parameters | Current |
| Flan-T5 Small | Use when the workload needs 512 context and 80M parameters. | 2022-10 | 512 context80M parameters | Current |
| Flan-T5 Base | Use when the workload needs 512 context and 250M parameters. | 2022-10 | 512 context250M parameters | Current |
Release Timeline
1 release groupSpecifications(5 models)
| Model | Released | Context | Parameters |
|---|---|---|---|
| Flan-T5 XXL | 2022-10 | 512 | 11B |
| Flan-T5 XL | 2022-10 | 512 | 3B |
| Flan-T5 Large | 2022-10 | 512 | 780M |
| Flan-T5 Small | 2022-10 | 512 | 80M |
| Flan-T5 Base | 2022-10 | 512 | 250M |
Available From(2 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| Flan-T5 XL | IBM watsonx | $0.6 | $0.6 | Serverless |
| Flan-T5 XXL | IBM watsonx | $1.8 | $1.8 | Serverless |
Frequently Asked Questions
- What is FLAN-T5 used for?
- The FLAN-T5 family of large language models is a set of enhanced versions of the original T5 (Text-to-Text Transfer Transformer) models, introduced in the paper "Scaling Instruction-Finetuned Language Models" 489.
- How does FLAN-T5 compare to Gemma 4?
- FLAN-T5 by Google DeepMind is strongest where you need its listed use cases, while Gemma 4 by Google DeepMind is the closest related family to check for multimodal. FLAN-T5 has 5 listed variants and reaches up to 512 context, while Gemma 4 reaches up to 256k context, so compare the specs and pricing tables before choosing a production model.
- Which FLAN-T5 model should I use?
- For the lowest listed input price, start with Flan-T5 XL through IBM watsonx at $0.6/1M input tokens. For the most capable/latest local choice, evaluate Flan-T5 XXL with 512 context.






