Cohere Transcribe Models by Cohere
1 model2026
Details
ResearcherCohere
LicenseApache 2.0OSI-approved
Commercial useCommercial use: permitted
Models1
Released2026
Capabilities
MultimodalAll models
About
Cohere's automatic speech recognition (ASR) foundation models for high-fidelity transcription.
Current Variants
Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.
1 in view
Cohere Transcribe (03-2026)Current
Use when the workload needs speech recognition, 2B parameters, and multimodal inputs.
2026-03speech recognition2B parametersmultimodal inputs
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Cohere Transcribe (03-2026) | Use when the workload needs speech recognition, 2B parameters, and multimodal inputs. | 2026-03 | speech recognition2B parametersmultimodal inputs | Current |
Release Timeline
1 release group2026-03
1 current
Cohere Transcribe (03-2026)
Currentspeech recognition2B parametersmultimodal inputs
Specifications(1 models)
| Model | Released | Parameters | Multimodal |
|---|---|---|---|
| Cohere Transcribe (03-2026) | 2026-03 | 2B | Yes |
Frequently Asked Questions
- What is Cohere Transcribe used for?
- Cohere Transcribe is used for speech recognition, vision and multimodal work, and coding. The family description and listed model capabilities point to those workloads as the best fit.
- How does Cohere Transcribe compare to Command?
- Cohere Transcribe by Cohere is strongest where you need speech recognition, while Command by Cohere is the closest related family to check for multilingual. Cohere Transcribe has 1 listed variant, while Command reaches up to 256k context, so compare the specs and pricing tables before choosing a production model.
- Which Cohere Transcribe model should I use?
- If price is the main constraint, use the pricing table first because Cohere Transcribe does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Cohere Transcribe (03-2026) with multimodal inputs.





