LLM Reference

OpenAI Transcribe Models by OpenAI

OpenAIProprietaryProprietary
4 models2023–2025Up to 16k ctx

Details

ResearcherOpenAI
LicenseProprietary
Commercial useCommercial use with conditions
Models4
Released2023–2025
Max context16k

Capabilities

MultimodalAll models

Links

Website

About

OpenAI's speech-to-text model family. Includes the legacy whisper-1 (per-minute billing) and the newer gpt-4o-transcribe / gpt-4o-mini-transcribe / gpt-4o-transcribe-diarize (per-token billing with improved accuracy, language detection, and speaker diarization). Used via /v1/audio/transcriptions.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

4 in view

Use when the workload needs transcription, 16k context, and multimodal inputs.

2025-10transcription16k contextmultimodal inputs

Use when the workload needs transcription, 16k context, and multimodal inputs.

2025-03transcription16k contextmultimodal inputs

Use when the workload needs transcription, 16k context, and multimodal inputs.

2025-03transcription16k contextmultimodal inputs
WhisperCurrent

Use when the workload needs transcription, multimodal inputs, and audio.

2023-03transcriptionmultimodal inputsaudio

Release Timeline

3 release groups
2025-10
1 current
GPT-4o Transcribe Diarize
transcription16k contextmultimodal inputs
Current
2025-03
2 current
GPT-4o Mini Transcribe
transcription16k contextmultimodal inputs
Current
GPT-4o Transcribe
transcription16k contextmultimodal inputs
Current
2023-03
1 current
Whisper
transcriptionmultimodal inputsaudio
Current

Specifications(4 models)

OpenAI Transcribe model specifications comparison
ModelReleasedContextMultimodal
GPT-4o Transcribe Diarize2025-1016kYes
GPT-4o Transcribe2025-0316kYes
GPT-4o Mini Transcribe2025-0316kYes
Whisper2023-03Yes

Available From(1 provider)

Frequently Asked Questions

What is OpenAI Transcribe used for?
OpenAI Transcribe is used for transcription, audio, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does OpenAI Transcribe compare to GPT Realtime 2?
OpenAI Transcribe by OpenAI is strongest where you need transcription, while GPT Realtime 2 by OpenAI is the closest related family to check for realtime voice. OpenAI Transcribe has 4 listed variants and reaches up to 16k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
Which OpenAI Transcribe model should I use?
If price is the main constraint, use the pricing table first because OpenAI Transcribe does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate GPT-4o Transcribe Diarize with 16k context and multimodal inputs.