LLM ReferenceLLM Reference

DeepSeek V2

DeepSeekHighlight
6 models2024Up to 128K ctxFrom $0.14/1M input

About

The DeepSeek V2 family offers an array of large language models (LLMs) noted for their economic scalability and efficacy in inference. Its flagship, DeepSeek V2, features 236 billion parameters with 21 billion activated per token, allowing a substantial context length of 128,000 tokens 15. Leveraging advanced architectures such as Multi-head Latent Attention (MLA) and DeepSeekMoE, it achieves significant efficiency by compressing the key-value cache and employing sparse computation 1. The models are pretrained on a vast 8.1 trillion token dataset and refined through supervised fine-tuning and reinforcement learning 1. For more compact needs, DeepSeek V2-Lite offers a 16 billion parameter model, manageable on a single 40GB GPU 8. Additionally, DeepSeek Coder V2 caters specifically to programming, supporting 338 languages 2, while DeepSeek V2.5 blends general and coding abilities to enhance benchmarks 3. This family is recognized for balancing high performance with resource efficiency.

Specifications(6 models)

DeepSeek V2 model specifications comparison
ModelReleasedContextParametersFn CallingStructured Outputs
DeepSeek V2.52024-07128KYesNo
DeepSeek V2 Chat (0628)2024-06128K236BNoNo
DeepSeek V2 Lite2024-0532K16BNoNo
DeepSeek V2 Lite Chat2024-0532K16BNoNo
DeepSeek V22024-05128K236BNoYes
DeepSeek V2 Chat2024-05128K236BNoNo

Available From(2 providers)

Pricing

DeepSeek V2 model pricing by provider
ModelProviderInput / 1MOutput / 1MType
DeepSeek V2DeepSeek Platform$0.14$0.28Serverless
DeepSeek V2 Lite ChatFireworks AI$0.2$0.2Serverless
DeepSeek V2.5Fireworks AI$0.56$1.68Serverless

Frequently Asked Questions

What is DeepSeek V2?
The DeepSeek V2 family offers an array of large language models (LLMs) noted for their economic scalability and efficacy in inference. Its flagship, DeepSeek V2, features 236 billion parameters with 21 billion activated per token, allowing a substantial context length of 128,000 tokens 15. Leveraging advanced architectures such as Multi-head Latent Attention (MLA) and DeepSeekMoE, it achieves significant efficiency by compressing the key-value cache and employing sparse computation 1. The models are pretrained on a vast 8.1 trillion token dataset and refined through supervised fine-tuning and reinforcement learning 1. For more compact needs, DeepSeek V2-Lite offers a 16 billion parameter model, manageable on a single 40GB GPU 8. Additionally, DeepSeek Coder V2 caters specifically to programming, supporting 338 languages 2, while DeepSeek V2.5 blends general and coding abilities to enhance benchmarks 3. This family is recognized for balancing high performance with resource efficiency.
How many models are in the DeepSeek V2 family?
The DeepSeek V2 family contains 6 models.
What is the latest DeepSeek V2 model?
The latest model is DeepSeek V2.5, released in 2024-07.
How much does DeepSeek V2 cost?
DeepSeek V2 models range from $0.14/1M to $0.56/1M input tokens depending on the model and provider.

Models(6)