MT0 Models by BigScience
5 models2024Up to 1k ctxFrom $1.8/1M input
About
MT0 is a family of 5 AI models by BigScience, released in 2024.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
5 in view
MT0 XXLCurrent
Use when the workload needs 1k context and 13B parameters.
2024-011k context13B parameters
MT0 XLCurrent
Use when the workload needs 1k context and 3.7B parameters.
2024-011k context3.7B parameters
MT0 LargeCurrent
Use when the workload needs 1k context and 1.2B parameters.
2024-011k context1.2B parameters
MT0 BaseCurrent
Use when the workload needs 1k context and 580M parameters.
2024-011k context580M parameters
MT0 SmallCurrent
Use when the workload needs 1k context and 300M parameters.
2024-011k context300M parameters
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| MT0 XXL | Use when the workload needs 1k context and 13B parameters. | 2024-01 | 1k context13B parameters | Current |
| MT0 XL | Use when the workload needs 1k context and 3.7B parameters. | 2024-01 | 1k context3.7B parameters | Current |
| MT0 Large | Use when the workload needs 1k context and 1.2B parameters. | 2024-01 | 1k context1.2B parameters | Current |
| MT0 Base | Use when the workload needs 1k context and 580M parameters. | 2024-01 | 1k context580M parameters | Current |
| MT0 Small | Use when the workload needs 1k context and 300M parameters. | 2024-01 | 1k context300M parameters | Current |
Release Timeline
1 release groupSpecifications(5 models)
Available From(1 provider)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| MT0 XXL | IBM watsonx | $1.8 | $1.8 | Serverless |
Frequently Asked Questions
- What is MT0 used for?
- MT0 is used for coding and structured outputs. The family description and listed model capabilities point to those workloads as the best fit.
- How does MT0 compare to Claude 3?
- MT0 by BigScience is strongest where you need coding, while Claude 3 by Anthropic is the closest related family to check for vision and multimodal work. MT0 has 5 listed variants and reaches up to 1k context, while Claude 3 reaches up to 200k context, so compare the specs and pricing tables before choosing a production model.
