The coding leaderboard · for developers

Best for coding

6 editor picks · 16 eligible models · The model we'd hand a stranger reviewing a PR.

Editorial pick plus benchmark and API pricing context.

See raw /best

EDITOR'S CHOICEResearched 7d ago

Claude Fable 5

Anthropic · 1m context

Excellent

Top SWE-bench in production — patches like a senior engineer.

Anthropic's new flagship: 80.3% SWE-bench Pro, 96% SWE-bench Verified on Vals.ai, and 85.0% OSWorld-Verified make it the best production coding pick for non-trivial engineering tasks.

Open model

The numbers

$/1M out

$50.00

$10.00 input

Context

max window

Pros

+80.3% SWE-bench Pro
+96% SWE-bench Verified on Vals.ai
+1M context window

Cons

−$50 / 1M out — double Opus 4.8
−Overkill for boilerplate or short generation

Also worth picking

The runners-up

ranked by editorial pick orderEditorial tiersExcellentStrongSolid

#ModelTier$/1M outEditor's note

Claude Opus 4.8

Anthropic · 1m

$25.00 / 1M out

Prior holder that remains a strong value at $25 out, with 69.2% SWE-bench Pro and 88.6% SWE-bench Verified.

Claude Opus 4.8

Anthropic · 1m

$25.00

Prior holder that remains a strong value at $25 out, with 69.2% SWE-bench Pro and 88.6% SWE-bench Verified.

Claude Sonnet 5

Anthropic · 1m

$10.00 / 1M out

The new everyday Sonnet: 85.2% SWE-bench Verified, 63.2% SWE-bench Pro, 80.4% Terminal-Bench 2.1, and 128K max output at durable $3/$15 — ahead of GPT-5.5 for agentic coding loops we run daily.

Claude Sonnet 5

Anthropic · 1m

$10.00

The new everyday Sonnet: 85.2% SWE-bench Verified, 63.2% SWE-bench Pro, 80.4% Terminal-Bench 2.1, and 128K max output at durable $3/$15 — ahead of GPT-5.5 for agentic coding loops we run daily.

GPT-5.5

OpenAI · 1.05m

$30.00 / 1M out

OpenAI's current flagship: SWE-bench Pro 58.6 and HumanEval 94.2 — the best non-Claude reviewer we run on a real PR.

GPT-5.5

OpenAI · 1.05m

$30.00

OpenAI's current flagship: SWE-bench Pro 58.6 and HumanEval 94.2 — the best non-Claude reviewer we run on a real PR.

DeepSeek V4 Pro

DeepSeek · 1m

$0.87 / 1M out

Tops LiveCodeBench (93.5) and 80.6 SWE-bench at $0.87 out, open weights — near-frontier coding for a fraction of the price.

DeepSeek V4 Pro

DeepSeek · 1m

$0.87

Tops LiveCodeBench (93.5) and 80.6 SWE-bench at $0.87 out, open weights — near-frontier coding for a fraction of the price.

Claude Sonnet 4.6

Anthropic · 1m

$15.00 / 1M out

Solid predecessor Sonnet — keep until Sonnet 5 accumulates more third-party harness rows.

Claude Sonnet 4.6

Anthropic · 1m

$15.00

Solid predecessor Sonnet — keep until Sonnet 5 accumulates more third-party harness rows.

Eligibility

16 models are eligible for this board

Eligibility means tagged with useCases: [coding]. Pins must come from this pool.

All picks