DeepSeek V4 Flash
DeepSeek V4 Flash is worth evaluating for coding, rag, and agents when its provider route and context window match the workload.
Use it for
- Teams evaluating coding, rag, and agents
- Workloads that can use a 1m context window
- Buyers comparing 4 tracked provider routes
Do not use it for
- Vision or document-understanding workloads
- Family
- DeepSeek V4
- Released
- 2026-04-24
- Context
- 1m
- Max output
- 384,000
- Parameters
- 284B
- Architecture
- Mixture of Experts
- Specialization
- general
- Openness
- Open source
- License
- MIT(OSI)Commercial use allowed
- Training
- pretrained
Cheapest of 5 routes · OpenRouter
About
DeepSeek V4 Flash is a 284B parameter (13B activated) Mixture-of-Experts language model with 1M-token context. Features a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficient long-context inference. Supports thinking and non-thinking modes. Legacy API aliases deepseek-chat and deepseek-reasoner map to this model's non-thinking and thinking modes respectively. Pricing: $0.14/1M input, $0.28/1M output (cache hit: $0.0028/1M input). MIT licensed.
DeepSeek V4 Flash is an open-source model in the DeepSeek V4 family. The structured metadata tracks a 1m-token context window, reasoning, function calling, tool use, and structured outputs. This page tracks provider routes through DeepSeek Platform, OpenRouter, Microsoft Foundry, and 2 more, with the cheapest tracked route listed at $0.0983 input and $0.1966 output per 1M tokens. Headline tracked benchmarks include Google-Proof Q&A 88.1, MMLU PRO 86.2, and SWE-bench Verified 79.0.
Top use-case fit: coding, agents, and build tasks
Coding
Q/$ B4 relevant benchmarks in the decision map.
RAG
Included by capability and metadata signals in the decision map.
Agents
Q/$ A1 relevant benchmark in the decision map.
Provider price ladder
Compare all 5Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.
| Provider | Input / 1M | Output / 1M | Cache | Route |
|---|---|---|---|---|
| OpenRouter | $0.0983 | $0.1966 | - | Serverless |
| DeepSeek Platform | $0.140 | $0.280 | read $0.0028 | Serverless |
| Novita AI | $0.140 | $0.280 | - | Serverless |
| Vercel AI Gateway | $0.140 | $0.280 | read $0.0028 | Serverless |
Available via routers & gateways(8)
Azure AI Foundry Model Router
RouterMicrosoft Azure AI Foundry's native model router that uses a trained ML model to route each prompt in real time to the optimal Azure-hosted model, with Balanced/Cost/Quality mode selection and automatic failover.
Helicone
GatewayObservability-first AI gateway with routing, caching, rate limiting, and request tracing; Apache 2.0 open-source core with a managed hosted tier for logging and analytics.
Kong AI Gateway
GatewayMulti-LLM AI gateway built on Kong Gateway 3.x, adding semantic routing, load balancing, guardrails, and MCP traffic analytics as plugins over Kong's existing API management platform.
LiteLLM
GatewayOpen-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.
OpenRouter
HybridUnified hybrid gateway to 400+ models from 60+ providers via a single OpenAI-compatible API, with optional auto-routing that selects the best model per prompt.
Portkey
GatewayProduction AI gateway routing to 1,600+ LLMs with failover, load balancing, semantic caching, and guardrails; Apache 2.0 core is fully self-hostable with the complete feature set.
Capabilities
Benchmark peer barsfor Coding
Benchmark scores(8)
| Benchmark | Score | Version | Source |
|---|---|---|---|
| Google-Proof Q&A | 88.1 | diamond | https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash |
| MMLU PRO | 86.2 | Think Max | https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash |
| SWE-bench Verified | 79.0 | Think Max | https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash |
| SWE-bench Pro | 52.6 | Think Max | https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash |
| LiveCodeBench | 91.6 | Think Max | https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash |
| HumanEval | 69.5 | Base model non-think mode (pass@1) | https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash |
| Massive Multitask Language Understanding | 88.7 | Base model (DeepSeek-V4-Flash-Base) (accuracy) | https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash |
| Terminal-Bench 2.0 | 56.9 | Terminal-Bench 2.0 (accuracy%) | https://benchlm.ai/benchmarks/terminalBench2 |
Migration checks
No linked migration route is available for this model yet.
Rankings & picks(10)
Comparison and alternatives
Browse all comparisons →Show all 68 popular comparisonssorted by 7-day search impressions
Cheapest of 5 routes · OpenRouter