Step Models by StepFun
Details
Capabilities
About
The Step family of large language and multimodal models from StepFun (阶跃星辰). The series spans proprietary API models and open-weight Flash releases, including Step 3.7 Flash, a 198B-parameter sparse MoE vision-language model with 256K context, Apache 2.0 weights, and selectable reasoning levels for agentic coding, tool use, image, and video workflows.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 256k context, reasoning, and tool use.
Use when the workload needs 256k context and reasoning.
Use when the workload needs 256k context, function calling, and multimodal inputs.
Use when the workload needs 128k context and multimodal inputs.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Step 3.7 Flash | Use when the workload needs 256k context, reasoning, and tool use. | 2026-05 | 256k contextreasoningtool use | Current |
| Step 3.5 Flash | Use when the workload needs 256k context and reasoning. | 2026-01 | 256k contextreasoning | Current |
| StepFun Step-2 | Use when the workload needs 128k context. | 2025-10 | 128k context | Current |
| StepFun Step-1 | Use when the workload needs 128k context. | 2025-08 | 128k context | Current |
| Step-2 | Use when the workload needs 256k context, function calling, and multimodal inputs. | 2024-09 | 256k contextfunction callingmultimodal inputs | Current |
| Step-1V Turbo | Use when the workload needs multimodal inputs. | 2024-07 | multimodal inputs | Current |
| Step-1.5V | Use when the workload needs 128k context and multimodal inputs. | 2024-06 | 128k contextmultimodal inputs | Current |
| Step-1 | Use when the workload needs 128k context. | 2024-04 | 128k context | Current |
| Step-1V | Use when the workload needs multimodal inputs. | 2024-03 | multimodal inputs | Current |
| Step-Instruct | Use when provider availability and model metadata match the workload. | 2024-03 | — | Current |
| Step-Math | Use when provider availability and model metadata match the workload. | 2024-03 | — | Current |
Release Timeline
9 release groupsSpecifications(11 models)
| Model | Released | Context | Parameters | Vision | Multimodal | Reasoning | Fn Calling | Tool Use | Structured Outputs |
|---|---|---|---|---|---|---|---|---|---|
| Step 3.7 Flash | 2026-05 | 256k | 198B (11B active) | Yes | Yes | Yes | Yes | Yes | Yes |
| Step 3.5 Flash | 2026-01 | 256k | 196B (11B active) | No | No | Yes | No | No | No |
| StepFun Step-2 | 2025-10 | 128k | 1T (MoE)* | No | No | No | No | No | No |
| StepFun Step-1 | 2025-08 | 128k | — | No | No | No | No | No | No |
| Step-2 | 2024-09 | 256k | 1T (MoE)* | Yes | Yes | No | Yes | No | No |
| Step-1V Turbo | 2024-07 | — | — | No | Yes | No | No | No | No |
| Step-1.5V | 2024-06 | 128k | — | Yes | Yes | No | No | No | No |
| Step-1 | 2024-04 | 128k | — | No | No | No | No | No | No |
| Step-1V | 2024-03 | — | — | No | Yes | No | No | No | No |
| Step-Instruct | 2024-03 | — | — | No | No | No | No | No | No |
| Step-Math | 2024-03 | — | — | No | No | No | No | No | No |
Available From(3 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| Step 3.5 Flash | OpenRouter | $0.1 | $0.3 | Serverless |
| Step 3.7 Flash | StepFun | $0.2 | $1.15 | Serverless |
| Step 3.7 Flash | OpenRouter | $0.2 | $1.15 | Serverless |
Comparisons
- Step 3.7 Flash vs Step 3.5 Flash
- Step 3.7 Flash vs Gemini 2.5 Flash
- Step 3.7 Flash vs Kimi K2.6
- Step 3.7 Flash vs Qwen3-235B-A22B
- Step 3.7 Flash vs DeepSeek V3
- Step 3.7 Flash vs GPT-4o-mini
- Step 3.7 Flash vs MiniMax M2.7
- Step 3.7 Flash vs Claude 3.5 Haiku
Frequently Asked Questions
- What is Step used for?
- Step is used for vision and multimodal work, reasoning, and agent workflows and tool use. The family description and listed model capabilities point to those workloads as the best fit.
- How does Step compare to StepAudio 2.5?
- Step by StepFun is strongest where you need vision and multimodal work, while StepAudio 2.5 by StepFun is the closest related family to check for voice. Step has 11 listed variants and reaches up to 256k context, so compare the specs and pricing tables before choosing a production model.
- Which Step model should I use?
- For the lowest listed input price, start with Step 3.5 Flash through OpenRouter at $0.1/1M input tokens. For the most capable/latest local choice, evaluate Step 3.7 Flash with 256k context and reasoning, tool use, function calling, structured outputs, and multimodal inputs.
Models(11)
Step 3.7 Flash
Step 3.5 Flash
StepFun Step-2
StepFun Step-1
Step-2
Step-1V Turbo
Step-1.5V
Step-1
Step-1V
Step-Instruct
Step-Math
