
DeepSeek
About
The DeepSeek LLM family includes open-source large language models designed for exceptional language comprehension and diverse applications 410. These models shine in reasoning, coding, mathematics, and Chinese comprehension, often surpassing similar models in benchmarks 410. The lineup features base and chat models with parameter sizes of 7 billion and 67 billion, respectively 410. They are trained with a massive dataset of 2 trillion tokens in English and Chinese 410, and the architecture, based on the Llama model, enhances inference efficiency through Grouped-Query Attention in the 67B model 1. Available for research and commercial use, additional models like DeepSeek-Coder and DeepSeek-VL cater to code generation and vision-language tasks, respectively 89.