Quick Start
- 1
- 2Use the StepFun SDK or REST API to call
step-audio-2.5-tts— see the documentation for request format.
Code Examples
About StepFun
StepFun is a Chinese AI company providing API access to its Step series of large language and multimodal models.
Pricing on StepFun
Capabilities
MultimodalAudio
About StepAudio 2.5 TTS
StepAudio 2.5 TTS is StepFun's contextual text-to-speech model with fine-grained expressive control. Unlike tag-based TTS systems, it accepts plain natural language instructions to control emotion, pacing, pauses, and delivery. Supports zero-shot voice cloning with full timbre and emotion control. Priced at $0.85 per 10,000 characters (input text). Supports Chinese and English. Available via StepFun API (model: step-audio-2.5-tts). Part of the unified StepAudio 2.5 architecture described in arXiv:2605.23463.
Model Specs
Released2026-04-16