LLM Reference

GPT-4o Realtime Models by OpenAI

3 models2024Up to 128k ctx

About

GPT-4o, a revolutionary model by OpenAI, advances multimodal AI by integrating text, audio, and vision processing within a single neural network 135. Unlike its predecessors, it doesn't require separate pipelines for different modalities, allowing all inputs and outputs—text, audio, and images—to be processed seamlessly, leading to faster response times and improved contextual understanding 6. This enables more natural interactions, including real-time translation and nuanced audio and image analysis. Optimized tokenization, especially for non-Roman alphabets, increases efficiency and reduces costs. The GPT-4o family also includes a smaller, cost-effective version, GPT-4o mini, maintaining core capabilities with enhanced speed and efficiency 11. OpenAI plans to extend its capabilities by incorporating audio and video functionalities progressively 1.

Archived Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

3 in view3 retired

Keep for legacy integrations; evaluate gpt-realtime-1.5 before new work.

2024-12128k contextcode executionmultimodal inputs

Keep for legacy integrations; evaluate gpt-realtime-mini before new work.

2024-12128k contextcode executionmultimodal inputs

Keep for legacy integrations; evaluate GPT-4o Realtime Preview (12-17) before new work.

2024-10128k contextcode executionmultimodal inputs

Release Timeline

2 release groups
2024-12
2 retired
GPT-4o mini Realtime Preview (12-17)
128k contextcode executionmultimodal inputs
Replaced
GPT-4o Realtime Preview (12-17)
128k contextcode executionmultimodal inputs
Replaced
2024-10
1 retired
GPT-4o Realtime Preview (10-01)
128k contextcode executionmultimodal inputs
Replaced

Replaced By

Keep for legacy integrations; evaluate gpt-realtime-1.5 before new work.

Keep for legacy integrations; evaluate gpt-realtime-mini before new work.

Keep for legacy integrations; evaluate GPT-4o Realtime Preview (12-17) before new work.

Specifications(3 models)

GPT-4o Realtime model specifications comparison
ModelReleasedContextVisionCode Exec
GPT-4o Realtime Preview (12-17)2024-12128kYesYes
GPT-4o mini Realtime Preview (12-17)2024-12128kYesYes
GPT-4o Realtime Preview (10-01)2024-10128kYesYes

Available From(1 provider)

Frequently Asked Questions

What is GPT-4o Realtime used for?
GPT-4o Realtime is used for vision and multimodal work and code execution. The family description and listed model capabilities point to those workloads as the best fit.
How does GPT-4o Realtime compare to GPT Realtime 2?
GPT-4o Realtime by OpenAI is strongest where you need vision and multimodal work, while GPT Realtime 2 by OpenAI is the closest related family to check for translation. GPT-4o Realtime has 3 listed variants and reaches up to 128k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
Which GPT-4o Realtime model should I use?
If price is the main constraint, use the pricing table first because GPT-4o Realtime does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate GPT-4o Realtime Preview (12-17) with 128k context and multimodal inputs.

Models(3)