GPT-4o Audio Models by OpenAI
2 models2024Up to 128k ctx
About
GPT-4o Audio is a family of 2 AI models by OpenAI, released in 2024.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
2 in view
GPT-4o Audio Preview (12-17)Current
Use when the workload needs 128k context, code execution, and multimodal inputs.
2024-12128k contextcode executionmultimodal inputs
GPT-4o Audio Preview (10-01)Current
Use when the workload needs 128k context, code execution, and multimodal inputs.
2024-10128k contextcode executionmultimodal inputs
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| GPT-4o Audio Preview (12-17) | Use when the workload needs 128k context, code execution, and multimodal inputs. | 2024-12 | 128k contextcode executionmultimodal inputs | Current |
| GPT-4o Audio Preview (10-01) | Use when the workload needs 128k context, code execution, and multimodal inputs. | 2024-10 | 128k contextcode executionmultimodal inputs | Current |
Release Timeline
2 release groups2024-12
1 current
GPT-4o Audio Preview (12-17)
Current128k contextcode executionmultimodal inputs
2024-10
1 current
GPT-4o Audio Preview (10-01)
Current128k contextcode executionmultimodal inputs
Specifications(2 models)
| Model | Released | Context | Vision | Code Exec |
|---|---|---|---|---|
| GPT-4o Audio Preview (12-17) | 2024-12 | 128k | Yes | Yes |
| GPT-4o Audio Preview (10-01) | 2024-10 | 128k | Yes | Yes |
Frequently Asked Questions
- What is GPT-4o Audio used for?
- GPT-4o Audio is used for vision and multimodal work and code execution. The family description and listed model capabilities point to those workloads as the best fit.
- How does GPT-4o Audio compare to GPT Realtime 2?
- GPT-4o Audio by OpenAI is strongest where you need vision and multimodal work, while GPT Realtime 2 by OpenAI is the closest related family to check for translation. GPT-4o Audio has 2 listed variants and reaches up to 128k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
- Which GPT-4o Audio model should I use?
- If price is the main constraint, use the pricing table first because GPT-4o Audio does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate GPT-4o Audio Preview (12-17) with 128k context and multimodal inputs.






