LLM ReferenceLLM Reference

Benchmark Leaderboard

Top models by HumanEval score

#ModelScoreVersionSource
1o396.72025-04OpenAI
2Gemini 2.5 Pro93.12025-03Google DeepMind
3Claude 3.7 Sonnet932025-02Anthropic
4GPT-4.192.92025-04OpenAI
5Qwen2.5 72B92.7pass@1research
6Qwen2.5 72B Instruct92.7pass@1Open LLM Leaderboard
7Qwen2.5 Coder 32B Instruct92.72024-11Qwen Team
8Qwen3 235B A22B92.72025-04Qwen Team
9Claude 3.5 Sonnet92pass@1HELM, official documentation
10GPT-4o (05-13)90.2pass@1HELM, Open LLM Leaderboard
11Gemini 2.5 Flash90.12025-05Google DeepMind
12DeepSeek R189.92025-01DeepSeek
13Llama 3.1 405B89pass@1Open LLM Leaderboard, Meta official
14Grok-288.4pass@1Open LLM Leaderboard, xAI official
15Qwen2.5 32B Instruct88.4pass@1Open LLM Leaderboard
16Mixtral 8x22B Instruct v0.186.2pass@1Open LLM Leaderboard
17Mixtral 8x22B v0.186.2pass@1research
18DeepSeek V385.5pass@1Open LLM Leaderboard, DeepSeek official
19DeepSeek V3 032485.52025-03DeepSeek
20Falcon 180B85.1pass@1Open LLM Leaderboard