GPT Audio Models by OpenAI
OpenAIProprietary
2 models2024Up to 128k ctxFrom $0.6/1M input
Details
ResearcherOpenAI
LicenseProprietary
Commercial useCommercial use: conditional
Models2
Released2024
Max context128k
Capabilities
MultimodalAll models
Links
WebsiteAbout
OpenAI's audio models for Chat Completions API audio in/out. Includes gpt-audio-1.5 (flagship), gpt-audio, and gpt-audio-mini. Replaced the gpt-4o-audio-preview series.
Current Variants
Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.
2 in view
GPT AudioCurrent
Use when the workload needs audio, 128k context, and multimodal inputs.
2024-10audio128k contextmultimodal inputs
GPT Audio MiniCurrent
Use when the workload needs audio, 128k context, and multimodal inputs.
2024-10audio128k contextmultimodal inputs
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| GPT Audio | Use when the workload needs audio, 128k context, and multimodal inputs. | 2024-10 | audio128k contextmultimodal inputs | Current |
| GPT Audio Mini | Use when the workload needs audio, 128k context, and multimodal inputs. | 2024-10 | audio128k contextmultimodal inputs | Current |
Release Timeline
1 release group2024-10
2 current
GPT Audio
Currentaudio128k contextmultimodal inputs
GPT Audio Mini
Currentaudio128k contextmultimodal inputs
Specifications(2 models)
| Model | Released | Context | Multimodal |
|---|---|---|---|
| GPT Audio | 2024-10 | 128k | Yes |
| GPT Audio Mini | 2024-10 | 128k | Yes |
Available From(2 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| GPT Audio Mini | OpenRouter | $0.6 | $2.4 | Serverless |
| GPT Audio Mini | OpenAI API | $0.6 | $2.4 | Serverless |
| GPT Audio | OpenRouter | $2.5 | $10 | Serverless |
| GPT Audio | OpenAI API | $2.5 | $10 | Serverless |
Frequently Asked Questions
- What is GPT Audio used for?
- GPT Audio is used for audio and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
- How does GPT Audio compare to GPT Realtime 2?
- GPT Audio by OpenAI is strongest where you need audio, while GPT Realtime 2 by OpenAI is the closest related family to check for realtime voice. GPT Audio has 2 listed variants and reaches up to 128k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
- Which GPT Audio model should I use?
- For the lowest listed input price, start with GPT Audio Mini through OpenAI API at $0.6/1M input tokens. For the most capable/latest local choice, evaluate GPT Audio with 128k context and multimodal inputs.






