Last refreshed 2026-05-27. Next refresh: weekly.
Why use StepAudio 2.5 Realtime on StepFun?
StepFun offers StepAudio 2.5 Realtime with competitive pricing. StepFun is a Chinese AI company providing API access to its Step series of large language and multimodal models.
Setup recipe
Docs fallbackUse the provider REST API or SDKCreate a provider API keymodel: step-2.5-realtimestep-2.5-realtimeRequest example
step-2.5-realtime.Gotchas
- Use provider model ID "step-2.5-realtime", not the LLMReference slug "step-audio-2-5-realtime".
Capabilities
About StepAudio 2.5 Realtime
StepAudio 2.5 Realtime is StepFun's end-to-end real-time conversational voice model. It handles speech input and produces speech output through a single unified architecture with no intermediate ASR/TTS pipeline steps. Key capabilities include persona-consistent roleplay via dedicated RLHF training on million-scale persona data, paralinguistic comprehension (detecting and responding to tone, emotion, and speaking rate), and low-latency dialogue. Supports Chinese and English. Available via WebSocket API (step-2.5-realtime). Analogous in function to OpenAI's GPT Realtime models.
FAQ
What API model ID do I use for StepAudio 2.5 Realtime on StepFun?
Use the model ID step-2.5-realtime when calling StepFun's API.
Who created StepAudio 2.5 Realtime?
StepAudio 2.5 Realtime was created by StepFun as part of the StepAudio 2.5 model family.
Is StepAudio 2.5 Realtime open source?
StepAudio 2.5 Realtime is not open source; the seed data lists it as proprietary.