Claude Opus 4.7
- GPQA Diamond
- 94.2%
- Output (from)
- $25.00 / 1M
Last refreshed 2026-06-03. Next refresh: weekly.
The best mainstream APIs, ranked by capability first: GPQA Diamond, MMLU fallback, then lowest tracked input price.
Verdict
GPT-5.5 is the runner-up, 0.6 points back on GPQA Diamond.
Mainstream API picks now lead with capability: GPQA Diamond first, MMLU as fallback, then price only as the tie-break.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Gemini 3.1 Pro Preview PreviewVisionTools Capability signal: GPQA Diamond 94.3% | $2.00 | $12.00 | |
| 2 | Claude Opus 4.7 ReasoningVisionTools Capability signal: GPQA Diamond 94.2% | $5.00 | $25.00 | |
| 3 | GPT-5.5 ReasoningVisionTools Capability signal: GPQA Diamond 93.6% | $5.00 | $30.00 | |
| 4 | Claude Opus 4.8 ReasoningVisionTools Capability signal: GPQA Diamond 93.6% | $5.00 | $25.00 | |
| 5 | GPT-5.5 Pro ReasoningVisionTools Capability signal: GPQA Diamond 93.6% | $30.00 | $180.00 | |
| 6 | Qwen3.7-Max ReasoningTools Capability signal: GPQA Diamond 92.4% | $1.25 | $3.75 | |
| 7 | GPT-5.4 ReasoningTools Capability signal: GPQA Diamond 92% | $2.50 | $15.00 | |
| 8 | Gemini 3 Pro VisionTools Capability signal: GPQA Diamond 91.9% | $1.25 | $5.00 | |
| 9 | Claude Opus 4.6 ReasoningVisionTools Capability signal: GPQA Diamond 91.3% | $5.00 | $25.00 | |
| 10 | Kimi K2.6 ReasoningVisionTools Capability signal: GPQA Diamond 90.5% | $0.73 | $3.40 | |
| 11 | Gemini 3 Flash PreviewVisionTools Capability signal: GPQA Diamond 90.4% | $0.50 | $3.00 | |
| 12 | DeepSeek V4 Pro ReasoningTools Capability signal: GPQA Diamond 90.1% | $0.43 | $0.87 | |
| 13 | Grok 4.3 ReasoningVisionTools Capability signal: GPQA Diamond 90.1% | $1.25 | $2.50 | |
| 14 | Claude Sonnet 4.6 ReasoningVisionTools Capability signal: GPQA Diamond 89.9% | $3.00 | $15.00 | |
| 15 | Qwen3.5-397B-A17B ReasoningTools Capability signal: GPQA Diamond 89.3% | $0.39 | $2.34 | |
| 16 | Trinity-Large-Thinking ReasoningTools Capability signal: GPQA Diamond 89.2% | $0.22 | $0.85 | |
| 17 | ByteDance Doubao Seed 2.0 Pro VisionTools Capability signal: GPQA Diamond 88.9% | $0.47 | $2.37 | |
| 18 | GPT-5 ReasoningVisionTools Capability signal: GPQA Diamond 88.4% | $1.25 | $10.00 | |
| 19 | DeepSeek V4 Flash ReasoningTools Capability signal: GPQA Diamond 88.1% | $0.10 | $0.20 | |
| 20 | Grok 4.20 ReasoningVisionTools Capability signal: GPQA Diamond 88% | $1.25 | $2.50 |
Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.
GPT-5.5 Pro is OpenAI's premium variant of GPT-5.5, released April 23, 2026. Targets large quality gains for business, legal, education, and data science use cases. Scores 39.6% on FrontierMath Tier 4 (postdoctoral-level math problems), compared to 22.9% for Claude Opus 4.7. Priced at 6× the standard GPT-5.5 API rate. Available to ChatGPT subscribers and via API.
93.6%
GPQA Diamond
Qwen3.7-Max is Alibaba's flagship agentic reasoning model, announced at the Alibaba Cloud Summit on May 20, 2026. It features a 1M-token context window, extended-thinking (chain-of-thought) mode, and is designed for long-horizon autonomous tasks including coding, debugging, and multi-step workflows. The model is text-only (no vision input) and is available via Alibaba Cloud Model Studio (DashScope). Closed-weight; no open-source weights have been released.
92.4%
GPQA Diamond
GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.
92%
GPQA Diamond
Side-by-side comparison of the top picks by price, benchmark, and API access.