OpenAI Text-to-Speech Models by OpenAI
4 models2023–2025Up to 2k ctxFrom $0.6/1M input
Details
ResearcherOpenAI
LicenseProprietary
Commercial useCommercial use with conditions
Models4
Released2023–2025
Max context2k
Capabilities
Multimodal1 of 4 models
Links
WebsiteAbout
OpenAI's text-to-speech family includes the legacy TTS endpoints and the newer GPT-4o mini TTS model for controllable, low-latency speech generation.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
4 in view
GPT-4o Mini TTSCurrent
Use when the workload needs audio, 2k context, and multimodal inputs.
2025-03audio2k contextmultimodal inputs
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| GPT-4o Mini TTS | Use when the workload needs audio, 2k context, and multimodal inputs. | 2025-03 | audio2k contextmultimodal inputs | Current |
| TTS-1 | Use when the workload needs audio. | 2023-11 | audio | Current |
| TTS-1 HD | Use when the workload needs audio. | 2023-11 | audio | Current |
| OpenAI TTS | Use when the workload needs text to speech and audio. | 2023-11 | text to speechaudio | Current |
Release Timeline
2 release groups2025-03
1 current
GPT-4o Mini TTS
Currentaudio2k contextmultimodal inputs
2023-11
3 current
Specifications(4 models)
| Model | Released | Context | Multimodal |
|---|---|---|---|
| GPT-4o Mini TTS | 2025-03 | 2k | Yes |
| TTS-1 | 2023-11 | — | No |
| TTS-1 HD | 2023-11 | — | No |
| OpenAI TTS | 2023-11 | — | No |
Available From(1 provider)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| GPT-4o Mini TTS | OpenAI API | $0.6 | — | Serverless |
| TTS-1 | OpenAI API | $15 | — | Serverless |
| TTS-1 HD | OpenAI API | $30 | — | Serverless |
Frequently Asked Questions
- What is OpenAI Text-to-Speech used for?
- OpenAI Text-to-Speech is used for audio, text to speech, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
- How does OpenAI Text-to-Speech compare to GPT Realtime 2?
- OpenAI Text-to-Speech by OpenAI is strongest where you need audio, while GPT Realtime 2 by OpenAI is the closest related family to check for realtime voice. OpenAI Text-to-Speech has 4 listed variants and reaches up to 2k context, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
- Which OpenAI Text-to-Speech model should I use?
- For the lowest listed input price, start with GPT-4o Mini TTS through OpenAI API at $0.6/1M input tokens. For the most capable/latest local choice, evaluate GPT-4o Mini TTS with 2k context and multimodal inputs.





