DeepSeek V2
About
The DeepSeek V2 family offers an array of large language models (LLMs) noted for their economic scalability and efficacy in inference. Its flagship, DeepSeek V2, features 236 billion parameters with 21 billion activated per token, allowing a substantial context length of 128,000 tokens 15. Leveraging advanced architectures such as Multi-head Latent Attention (MLA) and DeepSeekMoE, it achieves significant efficiency by compressing the key-value cache and employing sparse computation 1. The models are pretrained on a vast 8.1 trillion token dataset and refined through supervised fine-tuning and reinforcement learning 1. For more compact needs, DeepSeek V2-Lite offers a 16 billion parameter model, manageable on a single 40GB GPU 8. Additionally, DeepSeek Coder V2 caters specifically to programming, supporting 338 languages 2, while DeepSeek V2.5 blends general and coding abilities to enhance benchmarks 3. This family is recognized for balancing high performance with resource efficiency.
Specifications(6 models)
| Model | Released | Context | Parameters | Fn Calling | Structured Outputs |
|---|---|---|---|---|---|
| DeepSeek V2.5 | 2024-07 | 128K | — | Yes | No |
| DeepSeek V2 Chat (0628) | 2024-06 | 128K | 236B | No | No |
| DeepSeek V2 Lite | 2024-05 | 32K | 16B | No | No |
| DeepSeek V2 Lite Chat | 2024-05 | 32K | 16B | No | No |
| DeepSeek V2 | 2024-05 | 128K | 236B | No | Yes |
| DeepSeek V2 Chat | 2024-05 | 128K | 236B | No | No |
Available From(2 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| DeepSeek V2 | DeepSeek Platform | $0.14 | $0.28 | Serverless |
| DeepSeek V2 Lite Chat | Fireworks AI | $0.2 | $0.2 | Serverless |
| DeepSeek V2.5 | Fireworks AI | $0.56 | $1.68 | Serverless |
Frequently Asked Questions
- What is DeepSeek V2?
- The DeepSeek V2 family offers an array of large language models (LLMs) noted for their economic scalability and efficacy in inference. Its flagship, DeepSeek V2, features 236 billion parameters with 21 billion activated per token, allowing a substantial context length of 128,000 tokens 15. Leveraging advanced architectures such as Multi-head Latent Attention (MLA) and DeepSeekMoE, it achieves significant efficiency by compressing the key-value cache and employing sparse computation 1. The models are pretrained on a vast 8.1 trillion token dataset and refined through supervised fine-tuning and reinforcement learning 1. For more compact needs, DeepSeek V2-Lite offers a 16 billion parameter model, manageable on a single 40GB GPU 8. Additionally, DeepSeek Coder V2 caters specifically to programming, supporting 338 languages 2, while DeepSeek V2.5 blends general and coding abilities to enhance benchmarks 3. This family is recognized for balancing high performance with resource efficiency.
- How many models are in the DeepSeek V2 family?
- The DeepSeek V2 family contains 6 models.
- What is the latest DeepSeek V2 model?
- The latest model is DeepSeek V2.5, released in 2024-07.
- How much does DeepSeek V2 cost?
- DeepSeek V2 models range from $0.14/1M to $0.56/1M input tokens depending on the model and provider.






