Benchmark Leaderboard
Top models by Google-Proof Q&A score
| # | Model | Score | Version | Source |
|---|---|---|---|---|
| 1 | GPT-5.4 | 92 | diamond | https://pricepertoken.com/leaderboards/benchmark/gpqa |
| 2 | Qwen3.5-397B-A17B | 89.3 | diamond | Artificial Analysis |
| 3 | Kimi K2.5 | 87.9 | diamond | Artificial Analysis |
| 4 | o3 | 87.7 | diamond | https://openai.com/index/openai-o3-and-o4-mini-system-card/ |
| 5 | MiniMax M2.7 | 87.4 | diamond | Artificial Analysis |
| 6 | GLM-5.1 | 86.8 | diamond | Artificial Analysis |
| 7 | Gemini 2.5 Pro | 86.4 | diamond | https://deepmind.google/technologies/gemini/pro/ |
| 8 | Qwen3 235B A22B | 86.1 | diamond | https://pricepertoken.com/leaderboards/benchmark/gpqa |
| 9 | Qwen3.6-35B-A3B | 86 | diamond | Qwen3.6-35B-A3B model card (April 2026) |
| 10 | Qwen3.5 27B | 85.8 | diamond | https://pricepertoken.com/leaderboards/benchmark/gpqa |
| 11 | Gemma 4 31B | 85.7 | diamond | Artificial Analysis |
| 12 | Qwen3.5 122B A10B | 85.7 | diamond | https://pricepertoken.com/leaderboards/benchmark/gpqa |
| 13 | Qwen3.5 35B A3B | 84.5 | diamond | https://pricepertoken.com/leaderboards/benchmark/gpqa |
| 14 | Gemma 4 31B IT | 84.3 | diamond | https://huggingface.co/google/gemma-4-31b-it |
| 15 | Claude Opus 4.6 | 84.2 | diamond | https://www.anthropic.com/claude/opus |
| 16 | o3-pro | 84 | diamond | https://openai.com/index/openai-o3-pro/ |
| 17 | DeepSeek V3.2 | 84 | diamond | Artificial Analysis |
| 18 | DeepSeek R1 0528 | 81 | diamond | https://huggingface.co/deepseek-ai/DeepSeek-R1-0528 |
| 19 | NVIDIA Nemotron 3 Super 120B | 80 | diamond | Artificial Analysis |
| 20 | o3 Mini | 79.7 | diamond | https://openai.com/index/openai-o3-and-o4-mini-system-card/ |