DeepSeek V4 Models by DeepSeek
Details
Capabilities
Links
WebsiteAbout
DeepSeek V4 is the April 2026 DeepSeek release for long-context reasoning and coding. The family has two models: DeepSeek V4 Pro, a 1.6T-parameter MoE with 49B active parameters, a 1M-token context window, and the family's highest SWE-bench Verified score at 80.6, and DeepSeek V4 Flash, a 284B MoE with 13B active parameters for lower-cost inference. Direct DeepSeek pricing starts at $0.435 / $0.87 per million input / output tokens for Pro and $0.14 / $0.28 for Flash.
Compare Against Top Competitors
Long-context coding and reasoning set for comparing DeepSeek V4 Pro and Flash against Kimi, Claude, GLM, and GPT-5.Scores come from existing benchmark seed data; "-" means this site has no local score for that benchmark yet.
| Model | Context | Input / 1M | Chatbot Arena | SWE-bench Verified | LiveCodeBench | GPQA |
|---|---|---|---|---|---|---|
| DeepSeek V4 Profamily pick | 1m | $0.435/1M | 1,460 | 80.6 | 93.5 | 90.1 |
| DeepSeek V4 Flash | 1m | $0.0983/1M | - | 79 | 91.6 | 88.1 |
| Kimi K2.6 | 262k | - | 1,462 | 80.2 | 89.6 | 90.5 |
| Claude Sonnet 4.6 | 1m | - | 1,459 | 79.6 | 80 | 89.9 |
| GLM-5.1 | 200k | - | 1,472 | - | - | 86.2 |
| GPT-5.5 | 1.05m | - | 1,488 | 82.6 | - | 93.6 |
Current Variants
Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.
Use when the workload needs 1m context, 284B parameters, and reasoning.
Use when the workload needs 1m context, 1600B parameters, and reasoning.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| DeepSeek V4 Flash | Use when the workload needs 1m context, 284B parameters, and reasoning. | 2026-04 | 1m context284B parametersreasoning | Current |
| DeepSeek V4 Pro | Use when the workload needs 1m context, 1600B parameters, and reasoning. | 2026-04 | 1m context1600B parametersreasoning | Current |
Release Timeline
1 release groupSpecifications(2 models)
| Model | Released | Context | Parameters | Reasoning | Fn Calling | Tool Use | Structured Outputs |
|---|---|---|---|---|---|---|---|
| DeepSeek V4 Flash | 2026-04 | 1m | 284B | Yes | Yes | Yes | Yes |
| DeepSeek V4 Pro | 2026-04 | 1m | 1.6T | Yes | Yes | Yes | Yes |
Available From(6 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| DeepSeek V4 Flash | OpenRouter | $0.0983 | $0.1966 | Serverless |
| DeepSeek V4 Flash | DeepSeek Platform | $0.14 | $0.28 | Serverless |
| DeepSeek V4 Flash | Vercel AI Gateway | $0.14 | $0.28 | Serverless |
| DeepSeek V4 Flash | Novita AI | $0.14 | $0.28 | Serverless |
| DeepSeek V4 Pro | DeepSeek Platform | $0.435 | $0.87 | Serverless |
| DeepSeek V4 Pro | Vercel AI Gateway | $0.435 | $0.87 | Serverless |
| DeepSeek V4 Pro | OpenRouter | $0.44 | $0.87 | Serverless |
| DeepSeek V4 Pro | Novita AI | $1.64 | $3.38 | Serverless |
| DeepSeek V4 Pro | Fireworks AI | $1.74 | $3.48 | Serverless |
Comparisons
- GPT-4o (08-06) vs DeepSeek V4 Pro
- Claude Sonnet 4.6 vs DeepSeek V4 Pro
- Claude Sonnet 4.6 vs DeepSeek V4 Flash
- Claude Fable 5 vs DeepSeek V4 Pro
- DeepSeek V4 Pro vs Llama 4 Maverick 17B Instruct FP8
- DeepSeek V4 Pro vs Grok 4
- DeepSeek V4 Flash vs Grok 4
- DeepSeek V4 Pro vs Mistral Large 2.1 (2411)
Frequently Asked Questions
- What is DeepSeek V4 used for?
- DeepSeek V4 is used for reasoning, agent workflows and tool use, and structured outputs. The family description and listed model capabilities point to those workloads as the best fit.
- How does DeepSeek V4 compare to Janus?
- DeepSeek V4 by DeepSeek is strongest where you need reasoning, while Janus by DeepSeek is the closest related family to check for image generation. DeepSeek V4 has 2 listed variants and reaches up to 1m context, so compare the specs and pricing tables before choosing a production model.
- Which DeepSeek V4 model should I use?
- For the lowest listed input price, start with DeepSeek V4 Flash through OpenRouter at $0.0983/1M input tokens. For the most capable/latest local choice, evaluate DeepSeek V4 Flash with 1m context and reasoning, tool use, function calling, and structured outputs.





