GPT-4o Realtime Models by OpenAI
About
GPT-4o, a revolutionary model by OpenAI, advances multimodal AI by integrating text, audio, and vision processing within a single neural network 135. Unlike its predecessors, it doesn't require separate pipelines for different modalities, allowing all inputs and outputs—text, audio, and images—to be processed seamlessly, leading to faster response times and improved contextual understanding 6. This enables more natural interactions, including real-time translation and nuanced audio and image analysis. Optimized tokenization, especially for non-Roman alphabets, increases efficiency and reduces costs. The GPT-4o family also includes a smaller, cost-effective version, GPT-4o mini, maintaining core capabilities with enhanced speed and efficiency 11. OpenAI plans to extend its capabilities by incorporating audio and video functionalities progressively 1.
Archived Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Keep for legacy integrations; evaluate gpt-realtime-1.5 before new work.
Keep for legacy integrations; evaluate gpt-realtime-mini before new work.
Keep for legacy integrations; evaluate GPT-4o Realtime Preview (12-17) before new work.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| GPT-4o Realtime Preview (12-17) | Keep for legacy integrations; evaluate gpt-realtime-1.5 before new work. | 2024-12 | 128k contextcode executionmultimodal inputs | Replaced |
| GPT-4o mini Realtime Preview (12-17) | Keep for legacy integrations; evaluate gpt-realtime-mini before new work. | 2024-12 | 128k contextcode executionmultimodal inputs | Replaced |
| GPT-4o Realtime Preview (10-01) | Keep for legacy integrations; evaluate GPT-4o Realtime Preview (12-17) before new work. | 2024-10 | 128k contextcode executionmultimodal inputs | Replaced |
Release Timeline
2 release groupsReplaced By
Keep for legacy integrations; evaluate gpt-realtime-1.5 before new work.
Keep for legacy integrations; evaluate gpt-realtime-mini before new work.
Keep for legacy integrations; evaluate GPT-4o Realtime Preview (12-17) before new work.
Specifications(3 models)
| Model | Released | Context | Vision | Code Exec |
|---|---|---|---|---|
| GPT-4o Realtime Preview (12-17) | 2024-12 | 128k | Yes | Yes |
| GPT-4o mini Realtime Preview (12-17) | 2024-12 | 128k | Yes | Yes |
| GPT-4o Realtime Preview (10-01) | 2024-10 | 128k | Yes | Yes |
Available From(1 provider)
Frequently Asked Questions
- What is GPT-4o Realtime used for?
- GPT-4o Realtime is used for vision and multimodal work and code execution. The family description and listed model capabilities point to those workloads as the best fit.
- How does GPT-4o Realtime compare to GPT Realtime 2?
- GPT-4o Realtime by OpenAI is strongest where you need vision and multimodal work, while GPT Realtime 2 by OpenAI is the closest related family to check for translation. GPT-4o Realtime has 3 listed variants and reaches up to 128k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
- Which GPT-4o Realtime model should I use?
- If price is the main constraint, use the pricing table first because GPT-4o Realtime does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate GPT-4o Realtime Preview (12-17) with 128k context and multimodal inputs.






