LLM Reference

GPT-4o Audio Models by OpenAI

2 models2024Up to 128k ctx

About

GPT-4o Audio is a family of 2 AI models by OpenAI, released in 2024.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

2 in view

Use when the workload needs 128k context, code execution, and multimodal inputs.

2024-12128k contextcode executionmultimodal inputs

Use when the workload needs 128k context, code execution, and multimodal inputs.

2024-10128k contextcode executionmultimodal inputs

Release Timeline

2 release groups
2024-12
1 current
GPT-4o Audio Preview (12-17)
128k contextcode executionmultimodal inputs
Current
2024-10
1 current
GPT-4o Audio Preview (10-01)
128k contextcode executionmultimodal inputs
Current

Specifications(2 models)

GPT-4o Audio model specifications comparison
ModelReleasedContextVisionCode Exec
GPT-4o Audio Preview (12-17)2024-12128kYesYes
GPT-4o Audio Preview (10-01)2024-10128kYesYes

Frequently Asked Questions

What is GPT-4o Audio used for?
GPT-4o Audio is used for vision and multimodal work and code execution. The family description and listed model capabilities point to those workloads as the best fit.
How does GPT-4o Audio compare to GPT Realtime 2?
GPT-4o Audio by OpenAI is strongest where you need vision and multimodal work, while GPT Realtime 2 by OpenAI is the closest related family to check for translation. GPT-4o Audio has 2 listed variants and reaches up to 128k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
Which GPT-4o Audio model should I use?
If price is the main constraint, use the pricing table first because GPT-4o Audio does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate GPT-4o Audio Preview (12-17) with 128k context and multimodal inputs.

Models(2)