What is GPT Audio used for?

GPT Audio is used for audio and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.

How does GPT Audio compare to GPT Realtime 2?

GPT Audio by OpenAI is strongest where you need audio, while GPT Realtime 2 by OpenAI is the closest related family to check for realtime voice. GPT Audio has 2 listed variants and reaches up to 128k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.

Which GPT Audio model should I use?

For the lowest listed input price, start with GPT Audio Mini through OpenAI API at $0.6/1M input tokens. For the most capable/latest local choice, evaluate GPT Audio with 128k context and multimodal inputs.

GPT Audio Models by OpenAI

OpenAIProprietary

2 models2024Up to 128k ctxFrom $0.6/1M input

Details

ResearcherOpenAI

LicenseProprietary

Commercial useCommercial use: conditional

Models2

Released2024

Max context128k

Capabilities

MultimodalAll models

Links

Website

About

OpenAI's audio models for Chat Completions API audio in/out. Includes gpt-audio-1.5 (flagship), gpt-audio, and gpt-audio-mini. Replaced the gpt-4o-audio-preview series.

Current Variants

Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.

2 in view

GPT AudioCurrent

Use when the workload needs audio, 128k context, and multimodal inputs.

2024-10audio128k contextmultimodal inputs

GPT Audio MiniCurrent

Use when the workload needs audio, 128k context, and multimodal inputs.

2024-10audio128k contextmultimodal inputs

Current GPT Audio variants with use-when guidance and lifecycle status
Model	Use when	Released	Signals	Status
GPT Audio	Use when the workload needs audio, 128k context, and multimodal inputs.	2024-10	audio128k contextmultimodal inputs	Current
GPT Audio Mini	Use when the workload needs audio, 128k context, and multimodal inputs.	2024-10	audio128k contextmultimodal inputs	Current

Release Timeline

1 release group

2024-10

2 current

GPT Audio

audio128k contextmultimodal inputs

Current

GPT Audio Mini

audio128k contextmultimodal inputs

Current

Specifications(2 models)

GPT Audio model specifications comparison
Model	Released	Context	Multimodal
GPT Audio	2024-10	128k	Yes
GPT Audio Mini	2024-10	128k	Yes

Available From(2 providers)

OpenAI API

OpenRouter

Pricing

GPT Audio model pricing by provider
Model	Provider	Input / 1M	Output / 1M	Type
GPT Audio Mini	OpenRouter	$0.6	$2.4	Serverless
GPT Audio Mini	OpenAI API	$0.6	$2.4	Serverless
GPT Audio	OpenRouter	$2.5	$10	Serverless
GPT Audio	OpenAI API	$2.5	$10	Serverless

Frequently Asked Questions

What is GPT Audio used for?: GPT Audio is used for audio and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does GPT Audio compare to GPT Realtime 2?: GPT Audio by OpenAI is strongest where you need audio, while GPT Realtime 2 by OpenAI is the closest related family to check for realtime voice. GPT Audio has 2 listed variants and reaches up to 128k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
Which GPT Audio model should I use?: For the lowest listed input price, start with GPT Audio Mini through OpenAI API at $0.6/1M input tokens. For the most capable/latest local choice, evaluate GPT Audio with 128k context and multimodal inputs.

Models(2)