LLM Reference

OpenAI Text-to-Speech Models by OpenAI

OpenAIProprietaryProprietaryAudio
4 models2023–2025Up to 2k ctxFrom $0.6/1M input

Details

ResearcherOpenAI
LicenseProprietary
Commercial useCommercial use with conditions
Models4
Released2023–2025
Max context2k

Capabilities

Multimodal1 of 4 models

Links

Website

About

OpenAI's text-to-speech family includes the legacy TTS endpoints and the newer GPT-4o mini TTS model for controllable, low-latency speech generation.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

4 in view

Use when the workload needs audio, 2k context, and multimodal inputs.

2025-03audio2k contextmultimodal inputs
TTS-1Current

Use when the workload needs audio.

2023-11audio
TTS-1 HDCurrent

Use when the workload needs audio.

2023-11audio
OpenAI TTSCurrent

Use when the workload needs text to speech and audio.

2023-11text to speechaudio

Release Timeline

2 release groups
2025-03
1 current
GPT-4o Mini TTS
audio2k contextmultimodal inputs
Current
2023-11
3 current
OpenAI TTS
text to speechaudio
Current
TTS-1
audio
Current
Current

Specifications(4 models)

OpenAI Text-to-Speech model specifications comparison
ModelReleasedContextMultimodal
GPT-4o Mini TTS2025-032kYes
TTS-12023-11No
TTS-1 HD2023-11No
OpenAI TTS2023-11No

Available From(1 provider)

Pricing

OpenAI Text-to-Speech model pricing by provider
ModelProviderInput / 1MOutput / 1MType
GPT-4o Mini TTSOpenAI API$0.6Serverless
TTS-1OpenAI API$15Serverless
TTS-1 HDOpenAI API$30Serverless

Frequently Asked Questions

What is OpenAI Text-to-Speech used for?
OpenAI Text-to-Speech is used for audio, text to speech, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does OpenAI Text-to-Speech compare to GPT Realtime 2?
OpenAI Text-to-Speech by OpenAI is strongest where you need audio, while GPT Realtime 2 by OpenAI is the closest related family to check for realtime voice. OpenAI Text-to-Speech has 4 listed variants and reaches up to 2k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
Which OpenAI Text-to-Speech model should I use?
For the lowest listed input price, start with GPT-4o Mini TTS through OpenAI API at $0.6/1M input tokens. For the most capable/latest local choice, evaluate GPT-4o Mini TTS with 2k context and multimodal inputs.