DeepSeek V2 Models by DeepSeek
About
The DeepSeek V2 family offers an array of large language models (LLMs) noted for their economic scalability and efficacy in inference. Its flagship, DeepSeek V2, features 236 billion parameters with 21 billion activated per token, allowing a substantial context length of 128,000 tokens 15. Leveraging advanced architectures such as Multi-head Latent Attention (MLA) and DeepSeekMoE, it achieves significant efficiency by compressing the key-value cache and employing sparse computation 1. The models are pretrained on a vast 8.1 trillion token dataset and refined through supervised fine-tuning and reinforcement learning 1. For more compact needs, DeepSeek V2-Lite offers a 16 billion parameter model, manageable on a single 40GB GPU 8. Additionally, DeepSeek Coder V2 caters specifically to programming, supporting 338 languages 2, while DeepSeek V2.5 blends general and coding abilities to enhance benchmarks 3. This family is recognized for balancing high performance with resource efficiency.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 128k context and function calling.
Use when the workload needs 128k context and 236B parameters.
Use when the workload needs 32k context and 16B parameters.
Use when the workload needs 32k context and 16B parameters.
Use when the workload needs 128k context, 236B parameters, and structured outputs.
Use when the workload needs 128k context and 236B parameters.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| DeepSeek V2.5 | Use when the workload needs 128k context and function calling. | 2024-07 | 128k contextfunction calling | Current |
| DeepSeek V2 Chat (0628) | Use when the workload needs 128k context and 236B parameters. | 2024-06 | 128k context236B parameters | Current |
| DeepSeek V2 Lite | Use when the workload needs 32k context and 16B parameters. | 2024-05 | 32k context16B parameters | Current |
| DeepSeek V2 Lite Chat | Use when the workload needs 32k context and 16B parameters. | 2024-05 | 32k context16B parameters | Current |
| DeepSeek V2 | Use when the workload needs 128k context, 236B parameters, and structured outputs. | 2024-05 | 128k context236B parametersstructured outputs | Current |
| DeepSeek V2 Chat | Use when the workload needs 128k context and 236B parameters. | 2024-05 | 128k context236B parameters | Current |
Release Timeline
3 release groupsSpecifications(6 models)
| Model | Released | Context | Parameters | Fn Calling | Structured Outputs |
|---|---|---|---|---|---|
| DeepSeek V2.5 | 2024-07 | 128k | 238B total, 21B active (MoE) | Yes | No |
| DeepSeek V2 Chat (0628) | 2024-06 | 128k | 236B | No | No |
| DeepSeek V2 Lite | 2024-05 | 32k | 16B | No | No |
| DeepSeek V2 Lite Chat | 2024-05 | 32k | 16B | No | No |
| DeepSeek V2 | 2024-05 | 128k | 236B | No | Yes |
| DeepSeek V2 Chat | 2024-05 | 128k | 236B | No | No |
Available From(2 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| DeepSeek V2 | DeepSeek Platform | $0.14 | $0.28 | Serverless |
| DeepSeek V2 Lite Chat | Fireworks AI | $0.2 | $0.2 | Serverless |
| DeepSeek V2.5 | Fireworks AI | $0.56 | $1.68 | Serverless |
Frequently Asked Questions
- What is DeepSeek V2 used for?
- DeepSeek V2 is used for agent workflows and tool use, structured outputs, and coding. The family description and listed model capabilities point to those workloads as the best fit.
- How does DeepSeek V2 compare to Janus?
- DeepSeek V2 by DeepSeek is strongest where you need agent workflows and tool use, while Janus by DeepSeek is the closest related family to check for image generation. DeepSeek V2 has 6 listed variants and reaches up to 128k context, so compare the specs and pricing tables before choosing a production model.
- Which DeepSeek V2 model should I use?
- For the lowest listed input price, start with DeepSeek V2 through DeepSeek Platform at $0.14/1M input tokens. For the most capable/latest local choice, evaluate DeepSeek V2.5 with 128k context and function calling.




