DeepSeek 67B
About
DeepSeek LLM 67B is a sophisticated large language model featuring 67 billion parameters, trained on a vast dataset of 2 trillion tokens in English and Chinese. Building upon the LLaMA architecture, it integrates Grouped-Query Attention (GQA) to enhance computational efficiency. This model excels in tasks like reasoning, coding, mathematics, and Chinese comprehension, surpassing similar-sized options like Llama2 70B in various benchmarks. Its "chat" variant notably achieves a 73.78% pass rate on the HumanEval coding benchmark and performs well on mathematical datasets such as GSM8K. Open-source in nature, DeepSeek 67B supports both research and commercial endeavors, while recognizing common LLM limitations, including biases from training data and possible hallucinations.