LLM Reference

MT0 Models by BigScience

5 models2024Up to 1k ctxFrom $1.8/1M input

About

MT0 is a family of 5 AI models by BigScience, released in 2024.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

5 in view
MT0 XXLCurrent

Use when the workload needs 1k context and 13B parameters.

2024-011k context13B parameters
MT0 XLCurrent

Use when the workload needs 1k context and 3.7B parameters.

2024-011k context3.7B parameters
MT0 LargeCurrent

Use when the workload needs 1k context and 1.2B parameters.

2024-011k context1.2B parameters
MT0 BaseCurrent

Use when the workload needs 1k context and 580M parameters.

2024-011k context580M parameters
MT0 SmallCurrent

Use when the workload needs 1k context and 300M parameters.

2024-011k context300M parameters

Release Timeline

1 release group
2024-01
5 current
MT0 Base
1k context580M parameters
Current
MT0 Large
1k context1.2B parameters
Current
MT0 Small
1k context300M parameters
Current
MT0 XL
1k context3.7B parameters
Current
MT0 XXL
1k context13B parameters
Current

Specifications(5 models)

MT0 model specifications comparison
ModelReleasedContextParameters
MT0 XXL2024-011k13B
MT0 XL2024-011k3.7B
MT0 Large2024-011k1.2B
MT0 Base2024-011k580M
MT0 Small2024-011k300M

Available From(1 provider)

Pricing

MT0 model pricing by provider
ModelProviderInput / 1MOutput / 1MType
MT0 XXLIBM watsonx$1.8$1.8Serverless

Frequently Asked Questions

What is MT0 used for?
MT0 is used for coding and structured outputs. The family description and listed model capabilities point to those workloads as the best fit.
How does MT0 compare to Claude 3?
MT0 by BigScience is strongest where you need coding, while Claude 3 by Anthropic is the closest related family to check for vision and multimodal work. MT0 has 5 listed variants and reaches up to 1k context, while Claude 3 reaches up to 200k context, so compare the specs and pricing tables before choosing a production model.
Which MT0 model should I use?
For the lowest listed input price, start with MT0 XXL through IBM watsonx at $1.8/1M input tokens. For the most capable/latest local choice, evaluate MT0 XXL with 1k context.

Models(5)