LLM Reference

DeepSeek 67B

About

DeepSeek LLM 67B is a sophisticated large language model featuring 67 billion parameters, trained on a vast dataset of 2 trillion tokens in English and Chinese. Building upon the LLaMA architecture, it integrates Grouped-Query Attention (GQA) to enhance computational efficiency. This model excels in tasks like reasoning, coding, mathematics, and Chinese comprehension, surpassing similar-sized options like Llama2 70B in various benchmarks. Its "chat" variant notably achieves a 73.78% pass rate on the HumanEval coding benchmark and performs well on mathematical datasets such as GSM8K. Open-source in nature, DeepSeek 67B supports both research and commercial endeavors, while recognizing common LLM limitations, including biases from training data and possible hallucinations.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

FamilyDeepSeek
Released2023-11-29
Parameters67B
ArchitectureDecoder Only
Specializationgeneral