OpenAI Transcribe Models by OpenAI
4 models2023–2025Up to 16k ctx
Details
ResearcherOpenAI
LicenseProprietary
Commercial useCommercial use with conditions
Models4
Released2023–2025
Max context16k
Capabilities
MultimodalAll models
Links
WebsiteAbout
OpenAI's speech-to-text model family. Includes the legacy whisper-1 (per-minute billing) and the newer gpt-4o-transcribe / gpt-4o-mini-transcribe / gpt-4o-transcribe-diarize (per-token billing with improved accuracy, language detection, and speaker diarization). Used via /v1/audio/transcriptions.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
4 in view
GPT-4o Transcribe DiarizeCurrent
Use when the workload needs transcription, 16k context, and multimodal inputs.
2025-10transcription16k contextmultimodal inputs
GPT-4o TranscribeCurrent
Use when the workload needs transcription, 16k context, and multimodal inputs.
2025-03transcription16k contextmultimodal inputs
GPT-4o Mini TranscribeCurrent
Use when the workload needs transcription, 16k context, and multimodal inputs.
2025-03transcription16k contextmultimodal inputs
WhisperCurrent
Use when the workload needs transcription, multimodal inputs, and audio.
2023-03transcriptionmultimodal inputsaudio
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| GPT-4o Transcribe Diarize | Use when the workload needs transcription, 16k context, and multimodal inputs. | 2025-10 | transcription16k contextmultimodal inputs | Current |
| GPT-4o Transcribe | Use when the workload needs transcription, 16k context, and multimodal inputs. | 2025-03 | transcription16k contextmultimodal inputs | Current |
| GPT-4o Mini Transcribe | Use when the workload needs transcription, 16k context, and multimodal inputs. | 2025-03 | transcription16k contextmultimodal inputs | Current |
| Whisper | Use when the workload needs transcription, multimodal inputs, and audio. | 2023-03 | transcriptionmultimodal inputsaudio | Current |
Release Timeline
3 release groups2025-10
1 current
GPT-4o Transcribe Diarize
Currenttranscription16k contextmultimodal inputs
2025-03
2 current
GPT-4o Mini Transcribe
Currenttranscription16k contextmultimodal inputs
GPT-4o Transcribe
Currenttranscription16k contextmultimodal inputs
2023-03
1 current
Whisper
Currenttranscriptionmultimodal inputsaudio
Specifications(4 models)
| Model | Released | Context | Multimodal |
|---|---|---|---|
| GPT-4o Transcribe Diarize | 2025-10 | 16k | Yes |
| GPT-4o Transcribe | 2025-03 | 16k | Yes |
| GPT-4o Mini Transcribe | 2025-03 | 16k | Yes |
| Whisper | 2023-03 | — | Yes |
Available From(1 provider)
Frequently Asked Questions
- What is OpenAI Transcribe used for?
- OpenAI Transcribe is used for transcription, audio, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
- How does OpenAI Transcribe compare to GPT Realtime 2?
- OpenAI Transcribe by OpenAI is strongest where you need transcription, while GPT Realtime 2 by OpenAI is the closest related family to check for realtime voice. OpenAI Transcribe has 4 listed variants and reaches up to 16k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
- Which OpenAI Transcribe model should I use?
- If price is the main constraint, use the pricing table first because OpenAI Transcribe does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate GPT-4o Transcribe Diarize with 16k context and multimodal inputs.





