
DeepSeek Math
About
The DeepSeekMath family of large language models (LLMs) is a robust collection focusing on enhancing mathematical reasoning through open-source innovations. These models, built on the DeepSeek-Coder-Base-v1.5 architecture with 7 billion parameters, have been rigorously pre-trained on a substantial dataset of 120 billion mathematics-related tokens from Common Crawl, supplemented with natural language and code data 145. A standout feature is their application of Group Relative Policy Optimization (GRPO), which is a specialized reinforcement learning algorithm aimed at boosting mathematical problem-solving efficiency while optimizing memory consumption 14. The suite comprises several versions, including DeepSeekMath-Base 7B, DeepSeekMath-Instruct 7B, and DeepSeekMath-RL 7B, each designed to facilitate different stages of the training continuum, with the RL variant achieving an impressive 51.7% accuracy on the MATH benchmark without using external tools 145. These models are available on platforms such as Hugging Face and GitHub, promoting collaborative research and innovation 45. DeepSeekMath's capabilities rival those of proprietary models like Gemini-Ultra and GPT-4, marking it a pivotal development in the domain of open-source AI for tackling mathematical challenges 14.