LLM Reference

DeepSeek V2 Lite

About

DeepSeek V2 Lite is an efficient and cost-effective Mixture-of-Experts (MoE) language model by DeepSeek AI. It features 16 billion total parameters, with 2.4 billion active, and can process a context length of 32,000 tokens 12. This model stands out by outperforming 7B dense and 16B MoE models on English and Chinese benchmarks, despite its compact size 2. Utilizing cutting-edge architectures like Multi-head Latent Attention (MLA) and DeepSeekMoE, it ensures efficient inference and economical training 1. It is adaptable for deployment on a single 40GB GPU and can be fine-tuned using 8x80GB GPUs, with both base and chat model versions available 2.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

Released2024-05-16
Parameters16B
Context32K
ArchitectureMixture of Experts
Specializationgeneral