LLM Reference

The Best Mainstream LLM APIs, Ranked (2026)

Last refreshed 2026-06-03. Next refresh: weekly.

The best mainstream APIs, ranked by capability first: GPQA Diamond, MMLU fallback, then lowest tracked input price.

Verdict

Use Claude Opus 4.7 for mainstream API work today.

GPT-5.5 is the runner-up, 0.6 points back on GPQA Diamond.

Researched 10d agoWhy this pickMethodology

How we rank

Mainstream API picks now lead with capability: GPQA Diamond first, MMLU as fallback, then price only as the tie-break.

  1. EligibilityChat/completion models that pass the generic API leaderboard filter (no embeddings/rerankers/modality SKUs).
  2. Primary rankingGPQA Diamond, then MMLU when GPQA is missing or tied.
  3. Variant collapseWe keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell.
  4. Price tie-breakLower tracked input $/1M wins only after the capability scores are exhausted.
  5. PricingRates reflect tracked provider rows — spot/check enterprise tiers separately.
#ModelInput $/1MOutput $/1M
1Gemini 3.1 Pro Preview
PreviewVisionTools

Capability signal: GPQA Diamond 94.3%

$2.00$12.00
2Claude Opus 4.7
ReasoningVisionTools

Capability signal: GPQA Diamond 94.2%

$5.00$25.00
3GPT-5.5
ReasoningVisionTools

Capability signal: GPQA Diamond 93.6%

$5.00$30.00
4Claude Opus 4.8
ReasoningVisionTools

Capability signal: GPQA Diamond 93.6%

$5.00$25.00
5GPT-5.5 Pro
ReasoningVisionTools

Capability signal: GPQA Diamond 93.6%

$30.00$180.00
6Qwen3.7-Max
ReasoningTools

Capability signal: GPQA Diamond 92.4%

$1.25$3.75
7GPT-5.4
ReasoningTools

Capability signal: GPQA Diamond 92%

$2.50$15.00
8Gemini 3 Pro
VisionTools

Capability signal: GPQA Diamond 91.9%

$1.25$5.00
9Claude Opus 4.6
ReasoningVisionTools

Capability signal: GPQA Diamond 91.3%

$5.00$25.00
10Kimi K2.6
ReasoningVisionTools

Capability signal: GPQA Diamond 90.5%

$0.73$3.40
11Gemini 3 Flash
PreviewVisionTools

Capability signal: GPQA Diamond 90.4%

$0.50$3.00
12DeepSeek V4 Pro
ReasoningTools

Capability signal: GPQA Diamond 90.1%

$0.43$0.87
13Grok 4.3
ReasoningVisionTools

Capability signal: GPQA Diamond 90.1%

$1.25$2.50
14Claude Sonnet 4.6
ReasoningVisionTools

Capability signal: GPQA Diamond 89.9%

$3.00$15.00
15Qwen3.5-397B-A17B
ReasoningTools

Capability signal: GPQA Diamond 89.3%

$0.39$2.34
16Trinity-Large-Thinking
ReasoningTools

Capability signal: GPQA Diamond 89.2%

$0.22$0.85
17ByteDance Doubao Seed 2.0 Pro
VisionTools

Capability signal: GPQA Diamond 88.9%

$0.47$2.37
18GPT-5
ReasoningVisionTools

Capability signal: GPQA Diamond 88.4%

$1.25$10.00
19DeepSeek V4 Flash
ReasoningTools

Capability signal: GPQA Diamond 88.1%

$0.10$0.20
20Grok 4.20
ReasoningVisionTools

Capability signal: GPQA Diamond 88%

$1.25$2.50

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

  • GPT-5.5 Pro is OpenAI's premium variant of GPT-5.5, released April 23, 2026. Targets large quality gains for business, legal, education, and data science use cases. Scores 39.6% on FrontierMath Tier 4 (postdoctoral-level math problems), compared to 22.9% for Claude Opus 4.7. Priced at 6× the standard GPT-5.5 API rate. Available to ChatGPT subscribers and via API.

    93.6%

    GPQA Diamond

  • Qwen3.7-Max is Alibaba's flagship agentic reasoning model, announced at the Alibaba Cloud Summit on May 20, 2026. It features a 1M-token context window, extended-thinking (chain-of-thought) mode, and is designed for long-horizon autonomous tasks including coding, debugging, and multi-step workflows. The model is text-only (no vision input) and is available via Alibaba Cloud Model Studio (DashScope). Closed-weight; no open-source weights have been released.

    92.4%

    GPQA Diamond

  • GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.

    92%

    GPQA Diamond