LLM ReferenceLLM Reference

Benchmark Leaderboard

Top models by Google-Proof Q&A score

#ModelScoreVersionSource
1GPT-5.492diamondhttps://pricepertoken.com/leaderboards/benchmark/gpqa
2Qwen3.5-397B-A17B89.3diamondArtificial Analysis
3Kimi K2.587.9diamondArtificial Analysis
4o387.7diamondhttps://openai.com/index/openai-o3-and-o4-mini-system-card/
5MiniMax M2.787.4diamondArtificial Analysis
6GLM-5.186.8diamondArtificial Analysis
7Gemini 2.5 Pro86.4diamondhttps://deepmind.google/technologies/gemini/pro/
8Qwen3 235B A22B86.1diamondhttps://pricepertoken.com/leaderboards/benchmark/gpqa
9Qwen3.6-35B-A3B86diamondQwen3.6-35B-A3B model card (April 2026)
10Qwen3.5 27B85.8diamondhttps://pricepertoken.com/leaderboards/benchmark/gpqa
11Gemma 4 31B85.7diamondArtificial Analysis
12Qwen3.5 122B A10B85.7diamondhttps://pricepertoken.com/leaderboards/benchmark/gpqa
13Qwen3.5 35B A3B84.5diamondhttps://pricepertoken.com/leaderboards/benchmark/gpqa
14Gemma 4 31B IT84.3diamondhttps://huggingface.co/google/gemma-4-31b-it
15Claude Opus 4.684.2diamondhttps://www.anthropic.com/claude/opus
16o3-pro84diamondhttps://openai.com/index/openai-o3-pro/
17DeepSeek V3.284diamondArtificial Analysis
18DeepSeek R1 052881diamondhttps://huggingface.co/deepseek-ai/DeepSeek-R1-0528
19NVIDIA Nemotron 3 Super 120B80diamondArtificial Analysis
20o3 Mini79.7diamondhttps://openai.com/index/openai-o3-and-o4-mini-system-card/