GPT Realtime Whisper
gpt-realtime-whisper
Proprietary
About
GPT Realtime Whisper is OpenAI's streaming speech-to-text model, released May 7, 2026. It transcribes spoken audio live as a speaker talks rather than waiting for utterance completion, making it suitable for live captions, meeting notes, classroom transcripts, and real-time agent pipelines. The model is exposed through /v1/realtime/transcription_sessions and is priced per minute at $0.017 rather than per token.
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch APIAudioFine-tuning
Providers(1)
| Provider | Input (per 1M) | Output (per 1M) | Type | |
|---|---|---|---|---|
| OpenAI API | — | — | Serverless |
API Versions
gpt-realtime-whisperSpecifications
FamilyGPT Realtime 2
Released2026-05-07
Architecturetransformer
Specializationtranscription
LicenseProprietary
Trainingpretrained
Created by
Cutting-edge research and development.
San Francisco, California, United States
Founded 2015
Website