What is Voxtral used for?

Voxtral is used for text to speech, speech recognition, and audio. The family description and listed model capabilities point to those workloads as the best fit.

How does Voxtral compare to Ministral?

Voxtral by MistralAI is strongest where you need text to speech, while Ministral by MistralAI is the closest related family to check for vision and multimodal work. Voxtral has 4 listed variants and reaches up to 33k context, while Ministral reaches up to 32k context, so compare the specs and pricing tables before choosing a production model.

Which Voxtral model should I use?

For the lowest listed input price, start with Mistral Voxtral Mini 3B 2507 through AWS Bedrock at $0.04/1M input tokens. For the most capable/latest local choice, evaluate Voxtral Mini Transcribe 2 with 33k context and multimodal inputs.

Voxtral Models by MistralAI

MistralAI

4 models2025–2026Up to 33k ctxFrom $0.04/1M input

Details

ResearcherMistralAI

Models4

Released2025–2026

Max context33k

Capabilities

Multimodal2 of 4 models

About

Voxtral is a family of 4 AI models by MistralAI, released between 2025 and 2026.

Current Variants

Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.

4 in view

Voxtral TTSCurrent

Use when the workload needs text to speech, multimodal inputs, and audio.

2026-03text to speechmultimodal inputsaudio

Voxtral Mini Transcribe 2Current

Use when the workload needs speech recognition, 33k context, and multimodal inputs.

2026-02speech recognition33k contextmultimodal inputs

Mistral Voxtral Mini 3B 2507Current

Use when the workload needs audio and 3B parameters.

2025-07audio3B parameters

Mistral Voxtral Small 24B 2507Current

Use when the workload needs audio and 24B parameters.

2025-07audio24B parameters

Current Voxtral variants with use-when guidance and lifecycle status
Model	Use when	Released	Signals	Status
Voxtral TTS	Use when the workload needs text to speech, multimodal inputs, and audio.	2026-03	text to speechmultimodal inputsaudio	Current
Voxtral Mini Transcribe 2	Use when the workload needs speech recognition, 33k context, and multimodal inputs.	2026-02	speech recognition33k contextmultimodal inputs	Current
Mistral Voxtral Mini 3B 2507	Use when the workload needs audio and 3B parameters.	2025-07	audio3B parameters	Current
Mistral Voxtral Small 24B 2507	Use when the workload needs audio and 24B parameters.	2025-07	audio24B parameters	Current

Release Timeline

3 release groups

2026-03

1 current

Voxtral TTS

text to speechmultimodal inputsaudio

Current

2026-02

1 current

Voxtral Mini Transcribe 2

speech recognition33k contextmultimodal inputs

Current

2025-07

2 current

Mistral Voxtral Mini 3B 2507

audio3B parameters

Current

Mistral Voxtral Small 24B 2507

audio24B parameters

Current

Specifications(4 models)

Voxtral model specifications comparison
Model	Released	Context	Parameters	Multimodal
Voxtral TTS	2026-03	—	—	Yes
Voxtral Mini Transcribe 2	2026-02	33k	—	Yes
Mistral Voxtral Mini 3B 2507	2025-07	—	3B	No
Mistral Voxtral Small 24B 2507	2025-07	—	24B	No

Available From(2 providers)

AWS Bedrock

Mistral AI Studio

Pricing

Voxtral model pricing by provider
Model	Provider	Input / 1M	Output / 1M	Type
Mistral Voxtral Mini 3B 2507	AWS Bedrock	$0.04	$0.04	Serverless
Mistral Voxtral Small 24B 2507	AWS Bedrock	$0.1	$0.3	Serverless
Mistral Voxtral Small 24B 2507	Mistral AI Studio	$0.1	$0.4	Serverless

Frequently Asked Questions

What is Voxtral used for?: Voxtral is used for text to speech, speech recognition, and audio. The family description and listed model capabilities point to those workloads as the best fit.
How does Voxtral compare to Ministral?: Voxtral by MistralAI is strongest where you need text to speech, while Ministral by MistralAI is the closest related family to check for vision and multimodal work. Voxtral has 4 listed variants and reaches up to 33k context, while Ministral reaches up to 32k context, so compare the specs and pricing tables before choosing a production model.
Which Voxtral model should I use?: For the lowest listed input price, start with Mistral Voxtral Mini 3B 2507 through AWS Bedrock at $0.04/1M input tokens. For the most capable/latest local choice, evaluate Voxtral Mini Transcribe 2 with 33k context and multimodal inputs.