What is StepAudio 2.5 used for?

StepAudio 2.5 is used for voice, speech recognition, and text to speech. The family description and listed model capabilities point to those workloads as the best fit.

How does StepAudio 2.5 compare to Step?

StepAudio 2.5 by StepFun is strongest where you need voice, while Step by StepFun is the closest related family to check for vision and multimodal work. StepAudio 2.5 has 3 listed variants, while Step reaches up to 256k context, so compare the specs and pricing tables before choosing a production model.

Which StepAudio 2.5 model should I use?

If price is the main constraint, use the pricing table first because StepAudio 2.5 does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate StepAudio 2.5 Realtime with multimodal inputs.

StepAudio 2.5 Models by StepFun

StepFunProprietary

3 models2026

Details

ResearcherStepFun

LicenseProprietary

Commercial useCommercial use: conditional

Models3

Released2026

Capabilities

MultimodalAll models

Links

Website

About

StepAudio 2.5 is StepFun's unified audio-language foundation model family, introduced in May 2026 (arXiv:2605.23463). It covers three API-accessible capabilities — text-to-speech (TTS), automatic speech recognition (ASR), and real-time conversational voice (Realtime) — all built on a shared decoder architecture. The family claims top scores across five voice AI benchmarks, surpassing GPT Realtime and Gemini Live on tested dimensions. Supports Chinese and English.

Current Variants

Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.

3 in view

StepAudio 2.5 RealtimeCurrent

Use when the workload needs voice, multimodal inputs, and audio.

2026-05voicemultimodal inputsaudio

StepAudio 2.5 ASRCurrent

Use when the workload needs speech recognition, 4B parameters, and multimodal inputs.

2026-05speech recognition4B parametersmultimodal inputs

StepAudio 2.5 TTSCurrent

Use when the workload needs text to speech, multimodal inputs, and audio.

2026-04text to speechmultimodal inputsaudio

Current StepAudio 2.5 variants with use-when guidance and lifecycle status
Model	Use when	Released	Signals	Status
StepAudio 2.5 Realtime	Use when the workload needs voice, multimodal inputs, and audio.	2026-05	voicemultimodal inputsaudio	Current
StepAudio 2.5 ASR	Use when the workload needs speech recognition, 4B parameters, and multimodal inputs.	2026-05	speech recognition4B parametersmultimodal inputs	Current
StepAudio 2.5 TTS	Use when the workload needs text to speech, multimodal inputs, and audio.	2026-04	text to speechmultimodal inputsaudio	Current

Release Timeline

2 release groups

2026-05

2 current

StepAudio 2.5 ASR

speech recognition4B parametersmultimodal inputs

Current

StepAudio 2.5 Realtime

voicemultimodal inputsaudio

Current

2026-04

1 current

StepAudio 2.5 TTS

text to speechmultimodal inputsaudio

Current

Specifications(3 models)

StepAudio 2.5 model specifications comparison
Model	Released	Parameters	Multimodal
StepAudio 2.5 Realtime	2026-05	—	Yes
StepAudio 2.5 ASR	2026-05	4B	Yes
StepAudio 2.5 TTS	2026-04	—	Yes

Available From(1 provider)

StepFun

Frequently Asked Questions

What is StepAudio 2.5 used for?: StepAudio 2.5 is used for voice, speech recognition, and text to speech. The family description and listed model capabilities point to those workloads as the best fit.
How does StepAudio 2.5 compare to Step?: StepAudio 2.5 by StepFun is strongest where you need voice, while Step by StepFun is the closest related family to check for vision and multimodal work. StepAudio 2.5 has 3 listed variants, while Step reaches up to 256k context, so compare the specs and pricing tables before choosing a production model.
Which StepAudio 2.5 model should I use?: If price is the main constraint, use the pricing table first because StepAudio 2.5 does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate StepAudio 2.5 Realtime with multimodal inputs.

Models(3)

StepAudio 2.5 Realtime

StepAudio 2.5 ASR

StepAudio 2.5 TTS

StepAudio 2.5 Models by StepFun

Details

Capabilities

Links

About

Current Variants

Release Timeline

Specifications(3 models)

Available From(1 provider)

Frequently Asked Questions

Related Model Families

Models(3)