Last refreshed 2026-05-27. Next refresh: weekly.
Why use StepAudio 2.5 TTS on StepFun?
StepFun offers StepAudio 2.5 TTS with competitive pricing. StepFun is a Chinese AI company providing API access to its Step series of large language and multimodal models.
Setup recipe
Docs fallbackUse the provider REST API or SDKCreate a provider API keymodel: step-audio-2.5-ttsstep-audio-2.5-ttsRequest example
step-audio-2.5-tts.Gotchas
- Use provider model ID "step-audio-2.5-tts", not the LLMReference slug "step-audio-2-5-tts".
Capabilities
About StepAudio 2.5 TTS
StepAudio 2.5 TTS is StepFun's contextual text-to-speech model with fine-grained expressive control. Unlike tag-based TTS systems, it accepts plain natural language instructions to control emotion, pacing, pauses, and delivery. Supports zero-shot voice cloning with full timbre and emotion control. Priced at $0.85 per 10,000 characters (input text). Supports Chinese and English. Available via StepFun API (model: step-audio-2.5-tts). Part of the unified StepAudio 2.5 architecture described in arXiv:2605.23463.
FAQ
What API model ID do I use for StepAudio 2.5 TTS on StepFun?
Use the model ID step-audio-2.5-tts when calling StepFun's API.
Who created StepAudio 2.5 TTS?
StepAudio 2.5 TTS was created by StepFun as part of the StepAudio 2.5 model family.
Is StepAudio 2.5 TTS open source?
StepAudio 2.5 TTS is not open source; the seed data lists it as proprietary.