The research leaderboard · for knowledge workers

Best for research

5 editor picks · 9 eligible models · Deep reading, citations, synthesis.

Editorial pick plus benchmark and API pricing context.

See raw /best

EDITOR'S CHOICEResearched 7d ago

Claude Fable 5

Anthropic · 1m context

Excellent

Deep reading and footnoted synthesis the experts respect.

GDPval-AA ELO 1932 and Anthropic-reported finance, trading, and analytics wins make it the strongest general knowledge-work pick; do not use Mythos-only HLE rows as Fable evidence.

Open model

The numbers

$/1M out

$50.00

$10.00 input

Context

max window

Pros

+GDPval-AA ELO 1932
+Strong long-horizon document analysis
+1M context

Cons

−$50 / 1M out
−No Fable-specific GPQA or HLE score yet

Also worth picking

The runners-up

ranked by editorial pick orderEditorial tiersExcellentStrongSolid

#ModelTier$/1M outEditor's note

Claude Opus 4.7

Anthropic · 1m

$25.00 / 1M out

Still strong for writing-adjacent research and everyday synthesis at lower cost.

Claude Opus 4.7

Anthropic · 1m

$25.00

Still strong for writing-adjacent research and everyday synthesis at lower cost.

GPT-5.5

OpenAI · 1.05m

$30.00 / 1M out

GPQA 93.6 with a high reasoning ceiling; strongest on quantitative literature.

GPT-5.5

OpenAI · 1.05m

$30.00

GPQA 93.6 with a high reasoning ceiling; strongest on quantitative literature.

Gemini 3 Pro

Google DeepMind · 1m

$5.00 / 1M out

GPQA 91.9 plus live search and a 1M window — best for tracking the literature in real time.

Gemini 3 Pro

Google DeepMind · 1m

$5.00

GPQA 91.9 plus live search and a 1M window — best for tracking the literature in real time.

Grok 4.3

xAI · 1m

$2.50 / 1M out

GPQA 90.1, fresh, 1M context with real-time retrieval baked in.

Grok 4.3

xAI · 1m

$2.50

GPQA 90.1, fresh, 1M context with real-time retrieval baked in.

Eligibility

9 models are eligible for this board

Eligibility means tagged with useCases: [research]. Pins must come from this pool.

All picks