Quick Start
- 1
- 2
- 3You'll be billed $0.20/1M input, $1.15/1M output tokens.
Code Examples
About StepFun
StepFun is a Chinese AI company providing API access to its Step series of large language and multimodal models.
Pricing on StepFun
| Type | Price (per 1M) |
|---|---|
| Input tokens | $0.20 |
| Output tokens | $1.15 |
| Image input | $1.00 |
| Video input | $1.00 |
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsPrompt Caching
About Step 3.7 Flash
Step 3.7 Flash is StepFun's open-weights multimodal Mixture-of-Experts model for agentic coding, tool use, long-context reasoning, image understanding, and video understanding. It combines a 196B-parameter language backbone with a 1.8B-parameter vision encoder, activates about 11B parameters per token, supports a 256K-token context window, and exposes low, medium, and high reasoning levels for speed/depth tradeoffs. StepFun reports leading open-model results on ClawEval-1.1, SimpleVQA with Search, and SWE-bench Pro at launch. Weights are available on Hugging Face under Apache 2.0.
Model Specs
Released2026-05-29
Parameters198B (11B active)
Context256k
ArchitectureMixture of Experts