Using StepAudio 2.5 TTS on StepFun

Implementation guide · StepAudio 2.5 · StepFun

Serverless

Quick Start

1
Create an account at StepFun and generate an API key.
2
Use the StepFun SDK or REST API to call step-audio-2.5-tts — see the documentation for request format.

API Portal Documentation Model Card

Code Examples

See StepFun documentation for integration details.

About StepFun

StepFun is a Chinese AI company providing API access to its Step series of large language and multimodal models.

View all models on StepFun →

Pricing on StepFun

Capabilities

MultimodalAudio

About StepAudio 2.5 TTS

StepAudio 2.5 TTS is StepFun's contextual text-to-speech model with fine-grained expressive control. Unlike tag-based TTS systems, it accepts plain natural language instructions to control emotion, pacing, pauses, and delivery. Supports zero-shot voice cloning with full timbre and emotion control. Priced at $0.85 per 10,000 characters (input text). Supports Chinese and English. Available via StepFun API (model: step-audio-2.5-tts). Part of the unified StepAudio 2.5 architecture described in arXiv:2605.23463.

Full model details →

Model Specs

Released2026-04-16

More Models on StepFun

StepAudio 2.5 Realtime StepAudio 2.5 ASR

All models on StepFun →

Provider

StepFun