LLM Reference

The research leaderboard · for knowledge workers

Best for research

4 editor picks · 7 eligible models · Deep reading, citations, synthesis.

See raw /best
EDITOR'S CHOICEResearched 8d ago

Claude Opus 4.7

Anthropic · 1M context
Excellent

Deep reading and footnoted synthesis the experts respect.

GPQA Diamond 94.2 (top GA) with the cleanest footnoted synthesis across many sources.

The numbers
$/1M out
$25.00
$5.00 input
Context
1M
max window
Pros
  • +Top GA GPQA Diamond
  • +Careful, citeable reasoning
  • +1M context
Cons
  • $25 / 1M out
  • Slower deep passes

Also worth picking

The runners-up

ranked by editorial pick order
Editorial tiersExcellentStrongSolid
#ModelTier$/1M outEditor's note
#2
OpenAI · 1M
$30.00
GPQA 93.6 with a high reasoning ceiling; strongest on quantitative literature.
#3
Google DeepMind · 1M
$5.00
GPQA 91.9 plus live search and a 1M window — best for tracking the literature in real time.
#4
xAI · 1M
$2.50
GPQA 90.1, fresh, 1M context with real-time retrieval baked in.

Eligibility

7 models are eligible for this board

Eligibility means tagged with useCases: [research]. Pins must come from this pool.

All picks