LLM Reference

GPT Audio Models by OpenAI

OpenAIProprietary
2 models2024Up to 128k ctxFrom $0.6/1M input

Details

ResearcherOpenAI
LicenseProprietary
Commercial useCommercial use: conditional
Models2
Released2024
Max context128k

Capabilities

MultimodalAll models

Links

Website

About

OpenAI's audio models for Chat Completions API audio in/out. Includes gpt-audio-1.5 (flagship), gpt-audio, and gpt-audio-mini. Replaced the gpt-4o-audio-preview series.

Current Variants

Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.

2 in view
GPT AudioCurrent

Use when the workload needs audio, 128k context, and multimodal inputs.

2024-10audio128k contextmultimodal inputs

Use when the workload needs audio, 128k context, and multimodal inputs.

2024-10audio128k contextmultimodal inputs

Release Timeline

1 release group
2024-10
2 current
GPT Audio
audio128k contextmultimodal inputs
Current
GPT Audio Mini
audio128k contextmultimodal inputs
Current

Specifications(2 models)

GPT Audio model specifications comparison
ModelReleasedContextMultimodal
GPT Audio2024-10128kYes
GPT Audio Mini2024-10128kYes

Available From(2 providers)

Pricing

GPT Audio model pricing by provider
ModelProviderInput / 1MOutput / 1MType
GPT Audio MiniOpenRouter$0.6$2.4Serverless
GPT Audio MiniOpenAI API$0.6$2.4Serverless
GPT AudioOpenRouter$2.5$10Serverless
GPT AudioOpenAI API$2.5$10Serverless

Frequently Asked Questions

What is GPT Audio used for?
GPT Audio is used for audio and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does GPT Audio compare to GPT Realtime 2?
GPT Audio by OpenAI is strongest where you need audio, while GPT Realtime 2 by OpenAI is the closest related family to check for realtime voice. GPT Audio has 2 listed variants and reaches up to 128k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
Which GPT Audio model should I use?
For the lowest listed input price, start with GPT Audio Mini through OpenAI API at $0.6/1M input tokens. For the most capable/latest local choice, evaluate GPT Audio with 128k context and multimodal inputs.