What is MT0 used for?

MT0 is used for coding and structured outputs. The family description and listed model capabilities point to those workloads as the best fit.

How does MT0 compare to Claude 3?

MT0 by BigScience is strongest where you need coding, while Claude 3 by Anthropic is the closest related family to check for vision and multimodal work. MT0 has 5 listed variants and reaches up to 1k context, while Claude 3 reaches up to 200k context, so compare the specs and pricing tables before choosing a production model.

Which MT0 model should I use?

MT0 XXL is both the lowest listed input-price option at $1.8/1M input tokens through IBM watsonx and the strongest local starting point with 1k context. Use the provider table if latency, deployment type, or output-token pricing matters more than input price.

MT0 Models by BigScience

BigScienceApache 2.0Open source

5 models2024Up to 1k ctxFrom $1.8/1M input

Details

ResearcherBigScience

LicenseApache 2.0OSI-approved

Commercial useCommercial use: permitted

Models5

Released2024

Max context1k

Links

Website HuggingFace

About

MT0 is a family of 5 AI models by BigScience, released in 2024.

Current Variants

Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.

5 in view

MT0 XXLCurrent

Use when the workload needs 1k context and 13B parameters.

2024-011k context13B parameters

MT0 XLCurrent

Use when the workload needs 1k context and 3.7B parameters.

2024-011k context3.7B parameters

MT0 LargeCurrent

Use when the workload needs 1k context and 1.2B parameters.

2024-011k context1.2B parameters

MT0 BaseCurrent

Use when the workload needs 1k context and 580M parameters.

2024-011k context580M parameters

MT0 SmallCurrent

Use when the workload needs 1k context and 300M parameters.

2024-011k context300M parameters

Current MT0 variants with use-when guidance and lifecycle status
Model	Use when	Released	Signals	Status
MT0 XXL	Use when the workload needs 1k context and 13B parameters.	2024-01	1k context13B parameters	Current
MT0 XL	Use when the workload needs 1k context and 3.7B parameters.	2024-01	1k context3.7B parameters	Current
MT0 Large	Use when the workload needs 1k context and 1.2B parameters.	2024-01	1k context1.2B parameters	Current
MT0 Base	Use when the workload needs 1k context and 580M parameters.	2024-01	1k context580M parameters	Current
MT0 Small	Use when the workload needs 1k context and 300M parameters.	2024-01	1k context300M parameters	Current

Release Timeline

1 release group

2024-01

5 current

MT0 Base

1k context580M parameters

Current

MT0 Large

1k context1.2B parameters

Current

MT0 Small

1k context300M parameters

Current

MT0 XL

1k context3.7B parameters

Current

MT0 XXL

1k context13B parameters

Current

Specifications(5 models)

MT0 model specifications comparison
Model	Released	Context	Parameters
MT0 XXL	2024-01	1k	13B
MT0 XL	2024-01	1k	3.7B
MT0 Large	2024-01	1k	1.2B
MT0 Base	2024-01	1k	580M
MT0 Small	2024-01	1k	300M

Available From(1 provider)

IBM watsonx

Pricing

MT0 model pricing by provider
Model	Provider	Input / 1M	Output / 1M	Type
MT0 XXL	IBM watsonx	$1.8	$1.8	Serverless

Frequently Asked Questions

What is MT0 used for?: MT0 is used for coding and structured outputs. The family description and listed model capabilities point to those workloads as the best fit.
How does MT0 compare to Claude 3?: MT0 by BigScience is strongest where you need coding, while Claude 3 by Anthropic is the closest related family to check for vision and multimodal work. MT0 has 5 listed variants and reaches up to 1k context, while Claude 3 reaches up to 200k context, so compare the specs and pricing tables before choosing a production model.
Which MT0 model should I use?: For the lowest listed input price, start with MT0 XXL through IBM watsonx at $1.8/1M input tokens. For the most capable/latest local choice, evaluate MT0 XXL with 1k context.

Models(5)

MT0 XXL

2024-011k13B1 provider

MT0 XL

MT0 Large

MT0 Base

MT0 Small