Claude Opus 4.7
- GPQA Diamond
- 94.2%
- Output (from)
- $25.00 / 1M
Last refreshed 2026-05-18. Next refresh: weekly.
Top AI models for complex reasoning, math, and step-by-step problem solving, ranked by sourced reasoning benchmarks and release freshness.
Opinionated short stack for this category — scroll for the full leaderboard, pricing, and compare links.
Reasoning boards prioritize GPQA Diamond scores, favoring models explicitly tagged for reasoning or unusually strong GPQA.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Claude Mythos Preview PreviewReasoningVisionTools GPQA Diamond: 94.6% | — | — | |
| 2 | Gemini 3.1 Pro Preview PreviewVisionTools GPQA Diamond: 94.3% | $2.00 | $12.00 | |
| 3 | Claude Opus 4.7 ReasoningVisionTools GPQA Diamond: 94.2% | $5.00 | $25.00 | |
| 4 | GPT-5.5 ReasoningVisionTools GPQA Diamond: 93.6% | $5.00 | $30.00 | |
| 5 | GPT-5.5 Pro ReasoningVisionTools GPQA Diamond: 93.6% | $30.00 | $180.00 | |
| 6 | GPT-5.4 ReasoningTools GPQA Diamond: 92% | $2.50 | $15.00 | |
| 7 | Claude Opus 4.6 ReasoningVisionTools GPQA Diamond: 91.3% | $5.00 | $25.00 | |
| 8 | Kimi K2.6 ReasoningVisionTools GPQA Diamond: 90.5% | $0.75 | $3.50 | |
| 9 | DeepSeek V4 Pro ReasoningTools GPQA Diamond: 90.1% | $0.43 | $0.87 | |
| 10 | Claude Sonnet 4.6 ReasoningVisionTools GPQA Diamond: 89.9% | $3.00 | $15.00 | |
| 11 | Muse Spark ReasoningVisionTools GPQA Diamond: 89.5% | — | — | |
| 12 | Qwen3.5-397B-A17B ReasoningTools GPQA Diamond: 89.3% | $0.39 | $2.34 | |
| 13 | Trinity-Large-Thinking ReasoningTools GPQA Diamond: 89.2% | $0.22 | $0.85 | |
| 14 | ByteDance Doubao Seed 2.0 Pro Tools GPQA Diamond: 88.9% | — | — | |
| 15 | DeepSeek V4 Flash ReasoningTools GPQA Diamond: 88.1% | $0.14 | $0.28 | |
| 16 | Kimi K2.5 Tools GPQA Diamond: 87.9% | $0.44 | $2.00 | |
| 17 | Qwen3.6-27B ReasoningVisionTools GPQA Diamond: 87.8% | $0.32 | $3.20 | |
| 18 | o3 Reasoning GPQA Diamond: 87.7% | $2.00 | $8.00 | |
| 19 | MiniMax M2.7 ReasoningTools GPQA Diamond: 87.4% | $0.30 | $1.20 | |
| 20 | Gemini 3.1 Flash-Lite VisionTools GPQA Diamond: 86.9% | $0.25 | $1.50 |
Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.
GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.
92%
GPQA Diamond
Claude Opus 4.6 available on AWS Bedrock
91.3%
GPQA Diamond
Kimi K2.6 is Moonshot AI's latest agentic reasoning model, launched April 13 2026 as a code preview for Kimi Code subscribers. Built on a 1-trillion-parameter MoE architecture (32B active, 384 experts), it inherits K2.5's 256K context window and adds enhanced reliability for long-horizon agentic workflows — supporting 200–300 sequential tool calls without drift. Optimized for coding, multi-step agent planning, and vision-assisted tasks such as processing screenshots, PDFs, and spreadsheets.
90.5%
GPQA Diamond