GPT Audio Models by OpenAI
OpenAIProprietary
3 models2024–2026Up to 128K ctxFrom $0.6/1M input
About
OpenAI's audio models for Chat Completions API audio in/out. Includes gpt-audio-1.5 (flagship), gpt-audio, and gpt-audio-mini. Replaced the gpt-4o-audio-preview series.
Specifications(3 models)
| Model | Released | Context | Multimodal |
|---|---|---|---|
| gpt-audio-1.5 | 2026-05 | 128K | Yes |
| GPT Audio | 2024-10 | 128K | Yes |
| GPT Audio Mini | 2024-10 | 128K | Yes |
Available From(2 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| GPT Audio Mini | OpenRouter | $0.6 | $2.4 | Serverless |
| GPT Audio Mini | OpenAI API | $0.6 | $2.4 | Serverless |
| GPT Audio | OpenRouter | $2.5 | $10 | Serverless |
| gpt-audio-1.5 | OpenAI API | $2.5 | $10 | Serverless |
| GPT Audio | OpenAI API | $2.5 | $10 | Serverless |
Frequently Asked Questions
- What is GPT Audio used for?
- GPT Audio is used for vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
- How does GPT Audio compare to GPT Realtime 2?
- GPT Audio by OpenAI is strongest where you need vision and multimodal work, while GPT Realtime 2 by OpenAI is the closest related family to check for translation. GPT Audio has 3 listed variants and reaches up to 128K context, while GPT Realtime 2 reaches up to 131K context, so compare the specs and pricing tables before choosing a production model.
- Which GPT Audio model should I use?
- For the lowest listed input price, start with GPT Audio Mini through OpenRouter at $0.6/1M input tokens. For the most capable/latest local choice, evaluate gpt-audio-1.5 with 128K context and multimodal inputs.


