LLM Reference

DeepSeek V2

About

DeepSeek-V2 is an open-source Mixture-of-Experts (MoE) large language model known for its economical training and efficient inference. It features 236 billion parameters but activates only 21 billion for each token, allowing for a context length of 128,000 tokens. The model introduces key architectural innovations like Multi-head Latent Attention (MLA), which optimizes the inference process by compressing the key-value cache, and the DeepSeekMoE architecture for cost-effective training. Excelling across various benchmarks such as MMLU, BBH, and C-Eval, it surpasses many other open-source models. Additionally, DeepSeek-V2 includes a chat variant, DeepSeek-V2-Chat, tailored for conversational AI, and is commercially available under a permissive license. For a more lightweight option, DeepSeek-V2-Lite with 15.7 billion parameters is offered. The subsequent release, DeepSeek-V2.5, further boosts performance and includes function calling capabilities.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Providers(1)

ProviderInput (per 1M)Output (per 1M)Type
DeepSeek PlatformServerless

Specifications

Released2024-05-06
Parameters236B
Context128K
ArchitectureMixture of Experts
Specializationgeneral