Question 1

What is OpenAI Text-to-Speech used for?

Accepted Answer

OpenAI Text-to-Speech is used for audio, text to speech, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.

Question 2

How does OpenAI Text-to-Speech compare to GPT Realtime 2?

Accepted Answer

OpenAI Text-to-Speech by OpenAI is strongest where you need audio, while GPT Realtime 2 by OpenAI is the closest related family to check for realtime voice. OpenAI Text-to-Speech has 4 listed variants and reaches up to 2k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.

Question 3

Which OpenAI Text-to-Speech model should I use?

Accepted Answer

GPT-4o Mini TTS is both the lowest listed input-price option at $0.6/1M input tokens through OpenAI API and the strongest local starting point with 2k context and multimodal inputs. Use the provider table if latency, deployment type, or output-token pricing matters more than input price.

Model	Use when	Released	Signals	Status
GPT-4o Mini TTS	Use when the workload needs audio, 2k context, and multimodal inputs.	2025-03	audio2k contextmultimodal inputs	Current
TTS-1	Use when the workload needs audio.	2023-11	audio	Current
TTS-1 HD	Use when the workload needs audio.	2023-11	audio	Current
OpenAI TTS	Use when the workload needs text to speech and audio.	2023-11	text to speechaudio	Current

Model	Released	Context	Multimodal
GPT-4o Mini TTS	2025-03	2k	Yes
TTS-1	2023-11	—	No
TTS-1 HD	2023-11	—	No
OpenAI TTS	2023-11	—	No

Model	Provider	Input / 1M	Output / 1M	Type
GPT-4o Mini TTS	OpenAI API	$0.6	—	Serverless
TTS-1	OpenAI API	$15	—	Serverless
TTS-1 HD	OpenAI API	$30	—	Serverless

OpenAI Text-to-Speech Models by OpenAI

Details

Capabilities

Links

About

Current Variants

Release Timeline

Specifications(4 models)

Available From(1 provider)

Pricing

Frequently Asked Questions

Models(4)