Using StepAudio 2.5 ASR on StepFun

Implementation guide · StepAudio 2.5 · StepFun

Serverless

Quick Start

1
Create an account at StepFun and generate an API key.
2
Use the StepFun SDK or REST API to call stepaudio-2.5-asr — see the documentation for request format.

API Portal Documentation Model Card

Code Examples

See StepFun documentation for integration details.

About StepFun

StepFun is a Chinese AI company providing API access to its Step series of large language and multimodal models.

View all models on StepFun →

Pricing on StepFun

Capabilities

MultimodalAudio

About StepAudio 2.5 ASR

StepAudio 2.5 ASR is StepFun's automatic speech recognition model. At 4B parameters, it introduces Multi-Token Prediction (MTP) technology to parallelly predict multiple tokens per decoding step, enabling transcription of 5 minutes of audio in approximately 1 second. Achieves 400% higher throughput and 60% lower latency compared to prior StepFun ASR systems while maintaining state-of-the-art accuracy. Supports Chinese and English; accepts PCM, OGG, MP3, and WAV formats. Available via the StepFun API (model: stepaudio-2.5-asr). Part of the unified StepAudio 2.5 architecture described in arXiv:2605.23463.

Full model details →

Model Specs

Released2026-05-22

Parameters4B

More Models on StepFun

StepAudio 2.5 Realtime StepAudio 2.5 TTS

All models on StepFun →

Provider

StepFun