LLM Reference

Voxtral Models by MistralAI

4 models2025–2026Up to 33k ctxFrom $0.04/1M input

About

Voxtral is a family of 4 AI models by MistralAI, released between 2025 and 2026.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

4 in view

Use when the workload needs text to speech, multimodal inputs, and audio.

2026-03text to speechmultimodal inputsaudio

Use when the workload needs speech to text, 33k context, and multimodal inputs.

2026-02speech to text33k contextmultimodal inputs

Use when the workload needs 3B parameters.

2025-073B parameters

Use when the workload needs 24B parameters.

2025-0724B parameters

Release Timeline

3 release groups
2026-03
1 current
Voxtral TTS
text to speechmultimodal inputsaudio
Current
2026-02
1 current
Voxtral Mini Transcribe 2
speech to text33k contextmultimodal inputs
Current
2025-07
2 current
Current
Current

Specifications(4 models)

Voxtral model specifications comparison
ModelReleasedContextParametersMultimodal
Voxtral TTS2026-03Yes
Voxtral Mini Transcribe 22026-0233kYes
Mistral Voxtral Mini 3B 25072025-073BNo
Mistral Voxtral Small 24B 25072025-0724BNo

Available From(2 providers)

Pricing

Voxtral model pricing by provider
ModelProviderInput / 1MOutput / 1MType
Mistral Voxtral Mini 3B 2507AWS Bedrock$0.04$0.04Serverless
Mistral Voxtral Small 24B 2507AWS Bedrock$0.1$0.3Serverless

Frequently Asked Questions

What is Voxtral used for?
Voxtral is used for text to speech, speech to text, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does Voxtral compare to Ministral?
Voxtral by MistralAI is strongest where you need text to speech, while Ministral by MistralAI is the closest related family to check for structured outputs. Voxtral has 4 listed variants and reaches up to 33k context, while Ministral reaches up to 32k context, so compare the specs and pricing tables before choosing a production model.
Which Voxtral model should I use?
For the lowest listed input price, start with Mistral Voxtral Mini 3B 2507 through AWS Bedrock at $0.04/1M input tokens. For the most capable/latest local choice, evaluate Voxtral Mini Transcribe 2 with 33k context and multimodal inputs.

Models(4)