Using StepAudio 2.5 Realtime on StepFun

Implementation guide · StepAudio 2.5 · StepFun

Serverless

Quick Start

1
Create an account at StepFun and generate an API key.
2
Use the StepFun SDK or REST API to call step-2.5-realtime — see the documentation for request format.

API Portal Documentation Model Card

Code Examples

See StepFun documentation for integration details.

About StepFun

StepFun is a Chinese AI company providing API access to its Step series of large language and multimodal models.

View all models on StepFun →

Pricing on StepFun

Capabilities

MultimodalAudio

About StepAudio 2.5 Realtime

StepAudio 2.5 Realtime is StepFun's end-to-end real-time conversational voice model. It handles speech input and produces speech output through a single unified architecture with no intermediate ASR/TTS pipeline steps. Key capabilities include persona-consistent roleplay via dedicated RLHF training on million-scale persona data, paralinguistic comprehension (detecting and responding to tone, emotion, and speaking rate), and low-latency dialogue. Supports Chinese and English. Available via WebSocket API (step-2.5-realtime). Analogous in function to OpenAI's GPT Realtime models.

Full model details →

Model Specs

Released2026-05-24

More Models on StepFun

StepAudio 2.5 ASR StepAudio 2.5 TTS

All models on StepFun →

Provider

StepFun