Step 3.7 Flash

Name: Step 3.7 Flash
Author: StepFun

Released

2026-05-29

Last refreshed

2026-06-29

Status

Researched 45d ago

ProprietaryCommercial use: conditionalMultimodalCodingRAGAgentsLong contextVisionJSON / Tool use

Step 3.7 Flash is worth evaluating for coding, rag, and agents when its provider route and context window match the workload.

Use it for

Teams evaluating coding, rag, and agents
Workloads that can use a 256k context window
Buyers comparing 3 tracked provider routes

Do not use it for

Workloads where another current model has stronger sourced task evidence

Specifications

Family: Step
Released: 2026-05-29
Context: 256k
Parameters: 198B (11B active)
Architecture: Mixture of Experts
Specialization: general
Openness: Proprietary
License: ProprietaryCommercial use: conditional
Weights: Not released
Code: Unknown
Training: Pretrained

Created by

StepFun

One of China's leading AI 'Six Tigers'.

Shanghai, China

Founded 2023

Website

Pricing

Output / 1M

$1.15

Input / 1M

$0.200

Cheapest of 3 routes · OpenRouter

Providers(3)

StepFun OpenRouter NVIDIA NIM

View 3 provider routes

Links

Website HuggingFace

About

Step 3.7 Flash is StepFun's open-weights multimodal Mixture-of-Experts model for agentic coding, tool use, long-context reasoning, image understanding, and video understanding. It combines a 196B-parameter language backbone with a 1.8B-parameter vision encoder, activates about 11B parameters per token, supports a 256K-token context window, and exposes low, medium, and high reasoning levels for speed/depth tradeoffs. StepFun reports leading open-model results on ClawEval-1.1, SimpleVQA with Search, and SWE-bench Pro at launch. Weights are available on Hugging Face under Apache 2.0.

Step 3.7 Flash is a proprietary model in the Step family. The structured metadata tracks a 256k-token context window, multimodal input, reasoning, function calling, tool use, and structured outputs. This page tracks provider routes through StepFun, OpenRouter, and NVIDIA NIM, with the cheapest tracked route listed at $0.2 input and $1.15 output per 1M tokens. Headline tracked benchmarks include ClawEval-1.1 67.1, SimpleVQA with Search Tool 79.2, and V* with Python 95.3.

Top use-case fit: coding, agents, and build tasks

Coding

Q/$ C

1 relevant benchmark in the decision map.

RAG

Included by capability and metadata signals in the decision map.

Agents

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare all 3

Compare API pricing across 3 providers for input and output tokens, batch, and cached reads when available.

Provider	Input / 1M	Output / 1M	Cache	Route
OpenRouter	$0.200	$1.15	-	Serverless
StepFun	$0.200	$1.15	read $0.040	Serverless
NVIDIA NIM	-	-	-	ProvisionedPartial

Available via routers & gateways(1)

NVIDIA LLM Router Blueprint

Router

NVIDIA's open-source AI blueprint for LLM routing that selects the optimal model per prompt via intent classification or neural auto-routing; being deprecated 2026-06-20.

Free OSSNVIDIA NIM

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsPrompt Caching

Benchmark peer barsfor Coding

SWE-bench ProRank 20 of 41

80.3

69.2

64.7

64.6

Step 3.7 Flashcurrent

56.3

Benchmark scores(14)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.

Benchmark	Score	Version	Source
ClawEval-1.1	67.1	1st among open models at release	https://static.stepfun.com/blog/step-3.7-flash/
SimpleVQA with Search Tool	79.2	1st at release (GPT-5.5: 79.1)	https://static.stepfun.com/blog/step-3.7-flash/
V* with Python	95.3	2nd at release (Kimi K2.6: 96.9)	https://huggingface.co/stepfun-ai/Step-3.7-Flash
SWE-bench Pro	56.3	2nd at release (Claude Opus 4.7: 64.3, GPT-5.5: 58.6)	https://static.stepfun.com/blog/step-3.7-flash/
Terminal-Bench 2.1	59.5	Comparison: Step 3.5 Flash 53.37%, DeepSeek V4 Flash 62.0%, Gemini 3.5 Flash 76.2%, GPT-5.5 82.7%, Claude Opus 4.7 69.4%	https://static.stepfun.com/blog/step-3.7-flash/
Toolathlon	49.5	—	https://huggingface.co/stepfun-ai/Step-3.7-Flash
Humanity's Last Exam	47.2	—	https://huggingface.co/stepfun-ai/Step-3.7-Flash
GDPval-AA	45.8	—	https://huggingface.co/stepfun-ai/Step-3.7-Flash
WorldVQA	58.1	Comparison: Kimi K2.6 55.98%	https://static.stepfun.com/blog/step-3.7-flash/
HR-Bench 4K	89.1	Comparison: Kimi K2.6 91.25%	https://static.stepfun.com/blog/step-3.7-flash/
Android Daily	61.9	Comparison: Gemini 3 Flash 63.21%	https://static.stepfun.com/blog/step-3.7-flash/
DeepSearchQA	92.8	—	https://static.stepfun.com/blog/step-3.7-flash/
BrowseComp	75.8	From official StepFun blog post (accuracy%)	https://static.stepfun.com/blog/step-3.7-flash/
ResearchRubrics	71.7	—	https://static.stepfun.com/blog/step-3.7-flash/

Migration checks

No linked migration route is available for this model yet.

API versions

step-3.7-flash

Frequently asked questions

What is the context window of Step 3.7 Flash?

Step 3.7 Flash has a context window of 256k tokens.

How much does Step 3.7 Flash cost?

Step 3.7 Flash pricing ranges from $0.20/1M to $0.2/1M input tokens depending on the provider.

When was Step 3.7 Flash released?

Step 3.7 Flash was released on 2026-05-29.

Which providers offer Step 3.7 Flash?

Step 3.7 Flash is available from 3 providers: StepFun, OpenRouter, NVIDIA NIM.

What benchmarks has Step 3.7 Flash been tested on?

Step 3.7 Flash has been evaluated on 14 benchmarks, including ClawEval-1.1, SimpleVQA with Search Tool, V* with Python, SWE-bench Pro, Terminal-Bench 2.1.