LLM Reference

DeepSeek V2 Models by DeepSeek

DeepSeekHighlight
6 models2024Up to 128k ctxFrom $0.14/1M input

About

The DeepSeek V2 family offers an array of large language models (LLMs) noted for their economic scalability and efficacy in inference. Its flagship, DeepSeek V2, features 236 billion parameters with 21 billion activated per token, allowing a substantial context length of 128,000 tokens 15. Leveraging advanced architectures such as Multi-head Latent Attention (MLA) and DeepSeekMoE, it achieves significant efficiency by compressing the key-value cache and employing sparse computation 1. The models are pretrained on a vast 8.1 trillion token dataset and refined through supervised fine-tuning and reinforcement learning 1. For more compact needs, DeepSeek V2-Lite offers a 16 billion parameter model, manageable on a single 40GB GPU 8. Additionally, DeepSeek Coder V2 caters specifically to programming, supporting 338 languages 2, while DeepSeek V2.5 blends general and coding abilities to enhance benchmarks 3. This family is recognized for balancing high performance with resource efficiency.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

6 in view

Use when the workload needs 128k context and function calling.

2024-07128k contextfunction calling

Use when the workload needs 128k context and 236B parameters.

2024-06128k context236B parameters

Use when the workload needs 32k context and 16B parameters.

2024-0532k context16B parameters

Use when the workload needs 32k context and 16B parameters.

2024-0532k context16B parameters

Use when the workload needs 128k context, 236B parameters, and structured outputs.

2024-05128k context236B parametersstructured outputs

Use when the workload needs 128k context and 236B parameters.

2024-05128k context236B parameters

Release Timeline

3 release groups
2024-07
1 current
DeepSeek V2.5
128k contextfunction calling
Current
2024-06
1 current
DeepSeek V2 Chat (0628)
128k context236B parameters
Current
2024-05
4 current
DeepSeek V2
128k context236B parametersstructured outputs
Current
DeepSeek V2 Chat
128k context236B parameters
Current
DeepSeek V2 Lite
32k context16B parameters
Current
DeepSeek V2 Lite Chat
32k context16B parameters
Current

Specifications(6 models)

DeepSeek V2 model specifications comparison
ModelReleasedContextParametersFn CallingStructured Outputs
DeepSeek V2.52024-07128k238B total, 21B active (MoE)YesNo
DeepSeek V2 Chat (0628)2024-06128k236BNoNo
DeepSeek V2 Lite2024-0532k16BNoNo
DeepSeek V2 Lite Chat2024-0532k16BNoNo
DeepSeek V22024-05128k236BNoYes
DeepSeek V2 Chat2024-05128k236BNoNo

Available From(2 providers)

Pricing

DeepSeek V2 model pricing by provider
ModelProviderInput / 1MOutput / 1MType
DeepSeek V2DeepSeek Platform$0.14$0.28Serverless
DeepSeek V2 Lite ChatFireworks AI$0.2$0.2Serverless
DeepSeek V2.5Fireworks AI$0.56$1.68Serverless

Frequently Asked Questions

What is DeepSeek V2 used for?
DeepSeek V2 is used for agent workflows and tool use, structured outputs, and coding. The family description and listed model capabilities point to those workloads as the best fit.
How does DeepSeek V2 compare to Janus?
DeepSeek V2 by DeepSeek is strongest where you need agent workflows and tool use, while Janus by DeepSeek is the closest related family to check for image generation. DeepSeek V2 has 6 listed variants and reaches up to 128k context, so compare the specs and pricing tables before choosing a production model.
Which DeepSeek V2 model should I use?
For the lowest listed input price, start with DeepSeek V2 through DeepSeek Platform at $0.14/1M input tokens. For the most capable/latest local choice, evaluate DeepSeek V2.5 with 128k context and function calling.

Models(6)