What is MPT used for?

MPT is used for coding. The family description and listed model capabilities point to those workloads as the best fit.

How does MPT compare to Dolly 2.0?

MPT by Databricks Mosaic is strongest where you need coding, while Dolly 2.0 by Databricks Mosaic is the closest related family to check for math-heavy prompts. MPT has 2 listed variants and reaches up to 8k context, so compare the specs and pricing tables before choosing a production model.

Which MPT model should I use?

For the lowest listed input price, start with MPT 7B through Databricks Foundation Model Serving at $0.5/1M input tokens. For the most capable/latest local choice, evaluate MPT 30B with 8k context.

MPT Models by Databricks Mosaic

Databricks MosaicCC-BY-NC-SA-4.0

2 models2023Up to 8k ctxFrom $0.5/1M input

About

The MosaicML Pretrained Transformer (MPT) family is a collection of advanced, open-source large language models designed for diverse applications, available for commercial use. These models stand out for their decoder-only architecture reminiscent of GPT models, offering enhanced performance through optimized layer implementations and increased training stability. Notably, the MPT models eliminate context length limitations via ALiBi (Attention with Linear Biases), replacing traditional positional embeddings. The MPT family encompasses the base model MPT-7B, alongside specialized variants like MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+, each fine-tuned for distinct tasks ranging from instruction-following to storytelling. They were developed using a vast dataset comprising 1 trillion tokens of text and code, underscoring their capability to process and generate high-quality text outputs 125.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

2 in view

MPT 30BCurrent

Use when the workload needs 8k context and 30B parameters.

2023-038k context30B parameters

MPT 7BCurrent

Use when the workload needs 2k context and 7B parameters.

2023-032k context7B parameters

Current MPT variants with use-when guidance and lifecycle status
Model	Use when	Released	Signals	Status
MPT 30B	Use when the workload needs 8k context and 30B parameters.	2023-03	8k context30B parameters	Current
MPT 7B	Use when the workload needs 2k context and 7B parameters.	2023-03	2k context7B parameters	Current

Release Timeline

1 release group

2023-03

2 current

MPT 30B

8k context30B parameters

Current

MPT 7B

2k context7B parameters

Current

Specifications(2 models)

MPT model specifications comparison
Model	Released	Context	Parameters
MPT 30B	2023-03	8k	30B
MPT 7B	2023-03	2k	7B

Available From(2 providers)

Databricks Foundation Model Serving

Scale AI GenAI Platform

Pricing

MPT model pricing by provider
Model	Provider	Input / 1M	Output / 1M	Type
MPT 7B	Databricks Foundation Model Serving	$0.5	$0.5	Serverless
MPT 30B	Databricks Foundation Model Serving	$1	$1	Serverless

Frequently Asked Questions

What is MPT used for?: MPT is used for coding. The family description and listed model capabilities point to those workloads as the best fit.
How does MPT compare to Dolly 2.0?: MPT by Databricks Mosaic is strongest where you need coding, while Dolly 2.0 by Databricks Mosaic is the closest related family to check for math-heavy prompts. MPT has 2 listed variants and reaches up to 8k context, so compare the specs and pricing tables before choosing a production model.
Which MPT model should I use?: For the lowest listed input price, start with MPT 7B through Databricks Foundation Model Serving at $0.5/1M input tokens. For the most capable/latest local choice, evaluate MPT 30B with 8k context.