Voxtral Models by MistralAI
4 models2025–2026Up to 33k ctxFrom $0.04/1M input
About
Voxtral is a family of 4 AI models by MistralAI, released between 2025 and 2026.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
4 in view
Voxtral TTSCurrent
Use when the workload needs text to speech, multimodal inputs, and audio.
2026-03text to speechmultimodal inputsaudio
Voxtral Mini Transcribe 2Current
Use when the workload needs speech to text, 33k context, and multimodal inputs.
2026-02speech to text33k contextmultimodal inputs
Use when the workload needs 24B parameters.
2025-0724B parameters
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Voxtral TTS | Use when the workload needs text to speech, multimodal inputs, and audio. | 2026-03 | text to speechmultimodal inputsaudio | Current |
| Voxtral Mini Transcribe 2 | Use when the workload needs speech to text, 33k context, and multimodal inputs. | 2026-02 | speech to text33k contextmultimodal inputs | Current |
| Mistral Voxtral Mini 3B 2507 | Use when the workload needs 3B parameters. | 2025-07 | 3B parameters | Current |
| Mistral Voxtral Small 24B 2507 | Use when the workload needs 24B parameters. | 2025-07 | 24B parameters | Current |
Release Timeline
3 release groups2026-03
1 current
Voxtral TTS
Currenttext to speechmultimodal inputsaudio
2026-02
1 current
Voxtral Mini Transcribe 2
Currentspeech to text33k contextmultimodal inputs
2025-07
2 current
Mistral Voxtral Mini 3B 2507
Current3B parameters
Mistral Voxtral Small 24B 2507
Current24B parameters
Specifications(4 models)
| Model | Released | Context | Parameters | Multimodal |
|---|---|---|---|---|
| Voxtral TTS | 2026-03 | — | — | Yes |
| Voxtral Mini Transcribe 2 | 2026-02 | 33k | — | Yes |
| Mistral Voxtral Mini 3B 2507 | 2025-07 | — | 3B | No |
| Mistral Voxtral Small 24B 2507 | 2025-07 | — | 24B | No |
Available From(2 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| Mistral Voxtral Mini 3B 2507 | AWS Bedrock | $0.04 | $0.04 | Serverless |
| Mistral Voxtral Small 24B 2507 | AWS Bedrock | $0.1 | $0.3 | Serverless |
Frequently Asked Questions
- What is Voxtral used for?
- Voxtral is used for text to speech, speech to text, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
- How does Voxtral compare to Ministral?
- Voxtral by MistralAI is strongest where you need text to speech, while Ministral by MistralAI is the closest related family to check for structured outputs. Voxtral has 4 listed variants and reaches up to 33k context, while Ministral reaches up to 32k context, so compare the specs and pricing tables before choosing a production model.
- Which Voxtral model should I use?
- For the lowest listed input price, start with Mistral Voxtral Mini 3B 2507 through AWS Bedrock at $0.04/1M input tokens. For the most capable/latest local choice, evaluate Voxtral Mini Transcribe 2 with 33k context and multimodal inputs.






