LLM ReferenceLLM Reference

GPT Audio Models by OpenAI

OpenAIProprietary
3 models2024–2026Up to 128K ctxFrom $0.6/1M input

About

OpenAI's audio models for Chat Completions API audio in/out. Includes gpt-audio-1.5 (flagship), gpt-audio, and gpt-audio-mini. Replaced the gpt-4o-audio-preview series.

Specifications(3 models)

GPT Audio model specifications comparison
ModelReleasedContextMultimodal
gpt-audio-1.52026-05128KYes
GPT Audio2024-10128KYes
GPT Audio Mini2024-10128KYes

Available From(2 providers)

Pricing

GPT Audio model pricing by provider
ModelProviderInput / 1MOutput / 1MType
GPT Audio MiniOpenRouter$0.6$2.4Serverless
GPT Audio MiniOpenAI API$0.6$2.4Serverless
GPT AudioOpenRouter$2.5$10Serverless
gpt-audio-1.5OpenAI API$2.5$10Serverless
GPT AudioOpenAI API$2.5$10Serverless

Frequently Asked Questions

What is GPT Audio used for?
GPT Audio is used for vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does GPT Audio compare to GPT Realtime 2?
GPT Audio by OpenAI is strongest where you need vision and multimodal work, while GPT Realtime 2 by OpenAI is the closest related family to check for translation. GPT Audio has 3 listed variants and reaches up to 128K context, while GPT Realtime 2 reaches up to 131K context, so compare the specs and pricing tables before choosing a production model.
Which GPT Audio model should I use?
For the lowest listed input price, start with GPT Audio Mini through OpenRouter at $0.6/1M input tokens. For the most capable/latest local choice, evaluate gpt-audio-1.5 with 128K context and multimodal inputs.

Models(3)