LLM Reference

Using StepAudio 2.5 TTS on StepFun

Implementation guide · StepAudio 2.5 · StepFun

Serverless

Quick Start

  1. 1
    Create an account at StepFun and generate an API key.
  2. 2
    Use the StepFun SDK or REST API to call step-audio-2.5-tts — see the documentation for request format.

Code Examples

See StepFun documentation for integration details.

About StepFun

StepFun is a Chinese AI company providing API access to its Step series of large language and multimodal models.

Pricing on StepFun

Capabilities

MultimodalAudio

About StepAudio 2.5 TTS

StepAudio 2.5 TTS is StepFun's contextual text-to-speech model with fine-grained expressive control. Unlike tag-based TTS systems, it accepts plain natural language instructions to control emotion, pacing, pauses, and delivery. Supports zero-shot voice cloning with full timbre and emotion control. Priced at $0.85 per 10,000 characters (input text). Supports Chinese and English. Available via StepFun API (model: step-audio-2.5-tts). Part of the unified StepAudio 2.5 architecture described in arXiv:2605.23463.

Model Specs

Released2026-04-16

Provider

StepFun