LLM Reference

Compare AI models

Side-by-side comparison of any two LLMs — GPT vs Claude, Gemini vs DeepSeek, open vs proprietary — on pricing, benchmarks, API availability, context window, and release date.

Sitemap coverage 4342+ pairs

Decision builder

Pick the pair before opening the detail page

218 selectable models
Open comparison

Claude Opus 4.7 vs Claude Opus 4.8

Pick Claude Opus 4.8 for higher current agentic coding and computer-use confidence; token pricing is tied on tracked $5/1M input and $25/1M output routes, so keep Claude Opus 4.7 only for already-validated prompts or coding workflow support constraints.

0% gap
Output price
$25.00 / $25.00
Context
1m / 1m
Benchmarks
3 shared
Providers
7 / 7

Popular pairs

Browse comparisons with a decision signal attached

Claude Opus 4.7 vs Claude Opus 4.8

Pick Claude Opus 4.8 for higher current agentic coding and computer-use confidence; token pricing is tied on tracked $5/1M input and $25/1M output routes, so keep Claude Opus 4.7 only for already-validated prompts or coding workflow support constraints.

0% gap3 benchmarks
Output price
$25.00 / $25.00
Context
1m / 1m
Benchmarks
3 shared
Providers
7 / 7
CodingRAGAgentsLong contextClaude Opus 4.8 leads SWE-bench Verified

Claude Opus 4.8 vs GPT-5.5

Pick Claude Opus 4.8 for coding; GPT-5.5 is better when coding workflow support matters more.

20% gap3 benchmarks
Output price
$25.00 / $30.00
Context
1m / 1.05m
Benchmarks
3 shared
Providers
7 / 3
CodingRAGAgentsLong contextClaude Opus 4.8 leads SWE-bench Verified

Gemini 3.5 Flash vs GPT-5.5

Gemini 3.5 Flash is safer overall; choose GPT-5.5 when coding workflow support matters.

233% gap2 benchmarks
Output price
$9.00 / $30.00
Context
1.05m / 1.05m
Benchmarks
2 shared
Providers
4 / 3
CodingRAGAgentsLong contextGPT-5.5 leads HumanEval

DeepSeek V4 Pro vs GLM-5.1

DeepSeek V4 Pro is ~125% cheaper at $0.43/1M; pay for GLM-5.1 only for coding workflow support.

254% gap4 benchmarks
Output price
$0.870 / $3.08
Context
1m / 200k
Benchmarks
4 shared
Providers
5 / 5
CodingRAGAgentsLong contextDeepSeek V4 Pro leads Google-Proof Q&A

DeepSeek V4 Pro vs Kimi K2.6

Pick DeepSeek V4 Pro for pure code generation, large-codebase analysis, and the lowest per-token cost before its 75% discount expires on 2026-05-31. Pick Kimi K2.6 when your pipeline processes images, screenshots, PDFs, or spreadsheets, or when you need long agent runs with many sequential tool calls.

301% gap8 benchmarks
Output price
$0.870 / $3.49
Context
1m / 262k
Benchmarks
8 shared
Providers
5 / 8
CodingRAGAgentsLong contextDeepSeek V4 Pro leads MMLU PRO

Claude Sonnet 4.6 vs DeepSeek V4 Flash

DeepSeek V4 Flash is ~2952% cheaper at $0.10/1M; pay for Claude Sonnet 4.6 only for coding workflow support.

7530% gap3 benchmarks
Output price
$15.00 / $0.1966
Context
1m / 1m
Benchmarks
3 shared
Providers
6 / 6
CodingRAGAgentsLong contextClaude Sonnet 4.6 leads MMLU PRO

Llama 3 70B Instruct vs Llama 3.1 70B Instruct

Pick Llama 3.1 70B Instruct for coding; token pricing is tied, so keep Llama 3 70B Instruct only for already-validated prompts or route constraints.

0% gap2 benchmarks
Output price
$0.400 / $0.400
Context
8k / 128k
Benchmarks
2 shared
Providers
18 / 13
CodingClassificationJSON / Tool useRAGLlama 3.1 70B Instruct leads HumanEval

DeepSeek V4 Flash vs Grok 4

DeepSeek V4 Flash is ~1172% cheaper at $0.10/1M; pay for Grok 4 only for coding workflow support.

1172% gap2 benchmarks
Output price
$0.1966 / $2.50
Context
1m / 256k
Benchmarks
2 shared
Providers
6 / 4
CodingRAGAgentsLong contextGrok 4 leads MMLU PRO

DeepSeek V4 Flash vs Qwen3.6-27B

Treat this as a product-type comparison: DeepSeek V4 Flash is standalone API model, while Qwen3.6-27B is coding-specialized model. Choose based on workflow fit before reading any benchmark or price row as decisive.

1528% gap5 benchmarks
Output price
$0.1966 / $3.20
Context
1m / 262k
Benchmarks
5 shared
Providers
6 / 4
CodingRAGAgentsLong contextDeepSeek V4 Flash leads MMLU PRO

Claude Sonnet 4.6 vs DeepSeek V4 Pro

DeepSeek V4 Pro is ~590% cheaper at $0.43/1M; pay for Claude Sonnet 4.6 only for coding workflow support.

1624% gap6 benchmarks
Output price
$15.00 / $0.870
Context
1m / 1m
Benchmarks
6 shared
Providers
6 / 5
CodingRAGAgentsLong contextDeepSeek V4 Pro leads MMLU PRO

Gemini 2.5 Flash vs Grok 4

Gemini 2.5 Flash is ~317% cheaper at $0.30/1M; pay for Grok 4 only for coding workflow support.

0% gap2 benchmarks
Output price
$2.50 / $2.50
Context
1m / 256k
Benchmarks
2 shared
Providers
5 / 4
CodingRAGAgentsLong contextGemini 2.5 Flash leads MMLU PRO

Claude Opus 4.7 vs Kimi K2.6

Treat this as a product-type comparison: Claude Opus 4.7 is standalone API model, while Kimi K2.6 is coding-specialized model. Choose based on workflow fit before reading any benchmark or price row as decisive.

616% gap4 benchmarks
Output price
$25.00 / $3.49
Context
1m / 262k
Benchmarks
4 shared
Providers
7 / 8
CodingRAGAgentsLong contextClaude Opus 4.7 leads SWE-bench Verified

DeepSeek V4 Flash vs DeepSeek V4 Pro

DeepSeek V4 Flash is ~343% cheaper at $0.10/1M; pay for DeepSeek V4 Pro only for provider fit.

343% gap5 benchmarks
Output price
$0.1966 / $0.870
Context
1m / 1m
Benchmarks
5 shared
Providers
6 / 5
CodingRAGAgentsLong contextDeepSeek V4 Pro leads MMLU PRO

DeepSeek V4 Flash vs GLM-5.1

DeepSeek V4 Flash is ~897% cheaper at $0.10/1M; pay for GLM-5.1 only for coding workflow support.

1467% gap2 benchmarks
Output price
$0.1966 / $3.08
Context
1m / 200k
Benchmarks
2 shared
Providers
6 / 5
CodingRAGAgentsLong contextDeepSeek V4 Flash leads Google-Proof Q&A

Gemini 2.5 Pro vs Grok 4

Grok 4 is safer overall; choose Gemini 2.5 Pro when coding workflow support matters.

300% gap3 benchmarks
Output price
$10.00 / $2.50
Context
1m / 256k
Benchmarks
3 shared
Providers
4 / 4
CodingRAGAgentsLong contextGrok 4 leads MMLU PRO

Claude Sonnet 4.6 vs Kimi K2.6

Treat this as a product-type comparison: Claude Sonnet 4.6 is standalone API model, while Kimi K2.6 is coding-specialized model. Choose based on workflow fit before reading any benchmark or price row as decisive.

330% gap6 benchmarks
Output price
$15.00 / $3.49
Context
1m / 262k
Benchmarks
6 shared
Providers
6 / 8
CodingRAGAgentsLong contextClaude Sonnet 4.6 leads MMLU PRO

Grok 3 Mini vs Grok 4

Grok 3 Mini is ~400% cheaper at $0.25/1M; pay for Grok 4 only for coding workflow support.

97% gapBenchmark gap
Output price
$1.27 / $2.50
Context
131k / 256k
Benchmarks
No shared rows
Providers
2 / 4
RAGLong contextVisionJSON / Tool use

Claude Sonnet 4.6 vs Composer 2.5

Pick Sonnet 4.6 when API access, long context, broader tools, or non-Cursor deployment matter. Pick Composer 2.5 when you want the packaged IDE-native agent built on Kimi K2.5 workflow and standard-tier cost dominates. Treat Composer's 79.8% SWE-Bench Multilingual score and Sonnet's SWE-Bench Verified rows as different test sets, not a single leaderboard.

500% gap2 benchmarks
Output price
$15.00 / $2.50
Context
1m / 1m
Benchmarks
2 shared
Providers
6 / 1
CodingRAGAgentsLong contextComposer 2.5 leads Terminal-Bench 2.0

Popular comparisons

Top model matchups by recent search demand

The matchups buyers actually run before committing to a provider for coding, agents, or build automation.

Top 100
DeepSeek V4 Pro vs GLM-5.1#1 - 17.8K impressionsDeepSeek V4 Pro vs Kimi K2.6#2 - 9.7K impressionsClaude Sonnet 4.6 vs DeepSeek V4 Flash#3 - 5.7K impressionsLlama 3 70B Instruct vs Llama 3.1 70B Instruct#4 - 5.1K impressionsDeepSeek V4 Flash vs Grok 4#5 - 5K impressionsDeepSeek V4 Flash vs Qwen3.6-27B#6 - 4.9K impressionsClaude Sonnet 4.6 vs DeepSeek V4 Pro#7 - 4.5K impressionsGemini 2.5 Flash vs Grok 4#8 - 4.3K impressionsClaude Opus 4.7 vs Kimi K2.6#9 - 4.2K impressionsDeepSeek V4 Flash vs DeepSeek V4 Pro#10 - 4K impressionsDeepSeek V4 Flash vs GLM-5.1#11 - 3.7K impressionsGemini 2.5 Pro vs Grok 4#12 - 3.6K impressionsClaude Sonnet 4.6 vs Kimi K2.6#13 - 3.5K impressionsGrok 3 Mini vs Grok 4#14 - 3.5K impressionsClaude Sonnet 4.6 vs Composer 2.5#15 - 3.4K impressionsQwen3.6-27B vs Qwen3.6-35B-A3B#16 - 3.3K impressionsGPT-5.5 vs o3#17 - 3.3K impressionsDeepSeek V4 Flash vs Kimi K2.6#18 - 3.3K impressionsGLM-5 vs GLM-5.1#19 - 3K impressionsClaude Sonnet 4.6 vs GPT-5.5 Pro#20 - 2.8K impressionsComposer 2.5 vs Grok Build 0.1#21 - 2.7K impressionsDeepSeek V4 Pro vs Grok 4#22 - 2.6K impressionsGemini 2.5 Pro vs o3#23 - 2.6K impressionsDeepSeek V3.1 vs Grok 4#24 - 2.5K impressionsDeepSeek V4 Pro vs Gemini 2.5 Flash#25 - 2.4K impressionsClaude Sonnet 4.6 vs Gemini 3.5 Flash#26 - 2.3K impressionsClaude Opus 4.7 vs DeepSeek V4 Pro#27 - 2.2K impressionsGrok 4 vs Kimi K2.6#28 - 2.1K impressionsDeepSeek V3.1 vs DeepSeek V4 Pro#29 - 2.1K impressionsGrok-3 vs Grok 4#30 - 2.1K impressionsGrok 4 vs Qwen3-Max#31 - 2K impressionsDeepSeek V3 vs Grok 4#32 - 1.9K impressionsClaude Sonnet 4.5 vs DeepSeek V4 Pro#33 - 1.8K impressionsDeepSeek V3.1 vs DeepSeek V4 Flash#34 - 1.8K impressionsGemini 2.5 Flash vs Grok 4.3#35 - 1.8K impressionsClaude Opus 4.7 vs GLM-5.1#36 - 1.8K impressionsDeepSeek R1 vs Kimi K2.6#37 - 1.7K impressionsGPT-5.2 vs GPT-5.5#38 - 1.7K impressionsDeepSeek V4 Flash vs Qwen3.6-35B-A3B#39 - 1.7K impressionsClaude Sonnet 4.6 vs GLM-5.1#40 - 1.6K impressionsGrok 4.3 vs Kimi K2.6#41 - 1.6K impressionsClaude Sonnet 4.5 vs DeepSeek V4 Flash#42 - 1.6K impressionsClaude Opus 4.7 vs Qwen3.6-27B#43 - 1.6K impressionsTencent Hy3 Preview vs o3#44 - 1.6K impressionsDeepSeek V4 Flash vs Qwen3-Max#45 - 1.5K impressionsDeepSeek V4 Pro vs Gemini 2.5 Pro#46 - 1.5K impressionsDeepSeek V3 vs Kimi K2.6#47 - 1.5K impressionsDeepSeek V4 Flash vs Gemini 2.5 Flash#48 - 1.5K impressionsQwen3-Max vs Qwen3.6-27B#49 - 1.5K impressionsDeepSeek V4 Pro vs Gemini 3.1 Pro Preview#50 - 1.5K impressionsDeepSeek V4 Flash vs Gemini 2.5 Pro#51 - 1.5K impressionsGLM-5.1 vs GPT-5.5#52 - 1.4K impressionsClaude Opus 4.7 vs Claude Sonnet 4.6#53 - 1.4K impressionsGLM-5.1 vs Xiaomi MiMo-V2.5-Pro#54 - 1.4K impressionsGPT-5.5 vs Kimi K2.6#55 - 1.3K impressionsGemini 3.1 Pro Preview vs Grok 4#56 - 1.3K impressionsDeepSeek R1 vs Grok-3#57 - 1.3K impressionsClaude Mythos Preview vs Grok 4#58 - 1.3K impressionsDeepSeek V4 Pro vs Qwen3.6-27B#59 - 1.3K impressionsClaude Sonnet 4.6 vs Qwen3.6-27B#60 - 1.3K impressionsDeepSeek V4 Pro vs Grok-3#61 - 1.3K impressionsDeepSeek R1 vs DeepSeek V3.1#62 - 1.3K impressionsComposer 2.5 vs Gemini 3.5 Flash#63 - 1.3K impressionsDeepSeek R1 vs Qwen3-235B-A22B#64 - 1.2K impressionsGPT-5.4 vs Kimi K2.6#65 - 1.2K impressionsDeepSeek V4 Pro vs Kimi K2.5#66 - 1.2K impressionsClaude Opus 4.7 vs DeepSeek V4 Flash#67 - 1.2K impressionsGLM-5.1 vs Kimi K2.5#68 - 1.2K impressionsClaude Sonnet 4.6 vs GPT-5.5#69 - 1.2K impressionsGPT-5.5 vs Grok 4#70 - 1.2K impressionsDeepSeek V4 Flash vs Kimi K2.5#71 - 1.1K impressionsClaude Opus 4.6 vs DeepSeek V4 Pro#72 - 1.1K impressionsClaude Haiku 4.5 vs DeepSeek V4 Flash#73 - 1.1K impressionsDeepSeek R1 vs Qwen3-Max#74 - 1.1K impressionsLlama 3.3 70B vs Qwen2.5-72B#75 - 1.1K impressionsClaude Opus 4.7 vs Grok 4#76 - 1.1K impressionsComposer 2.5 vs DeepSeek V4 Pro#77 - 1K impressionsDeepSeek V3.1 vs Kimi K2.6#78 - 1K impressionsDeepSeek V4 Pro vs Qwen3.6-35B-A3B#79 - 1K impressionsDeepSeek V4 Pro vs GPT-5.5#80 - 1K impressionsLlama 3.3 70B Instruct (free) vs Qwen2.5-72B-Instruct#81 - 1K impressionsGemini 3.1 Pro Preview vs Kimi K2.6#82 - 1K impressionsClaude Opus 4.6 vs GLM-5.1#83 - 1K impressionsComposer 2.5 vs GPT-5.5#84 - 1K impressionsQwen3.5-122B-A10B vs Qwen3.6-27B#85 - 995 impressionsComposer 2.5 vs Kimi K2.6#86 - 995 impressionsClaude Sonnet 4.6 vs Kimi K2 Thinking#87 - 972 impressionsClaude Opus 4.7 vs Claude Sonnet 4.5#88 - 907 impressionsDeepSeek R1 Lite vs DeepSeek V4 Pro#89 - 903 impressionsClaude Opus 4.7 vs Composer 2.5#90 - 903 impressionsGPT-5.5 vs o3 Mini#91 - 900 impressionsClaude Opus 4.7 vs Gemini 3.5 Flash#92 - 900 impressionsQwen3.5-35B-A3B vs Qwen3.6-27B#93 - 899 impressionsMiniMax M2 vs MiniMax M3#94 - 880 impressionsDeepSeek V4 Pro vs GLM-5#95 - 872 impressionsLlama 3.3 70B vs Qwen2.5-72B-Instruct#96 - 859 impressionsDeepSeek V4 Pro vs Gemini 3.5 Flash#97 - 842 impressionsGrok 4 vs o3 Mini#98 - 841 impressionsDeepSeek R1 vs Gemini 2.5 Flash#99 - 836 impressionsClaude Opus 4.5 vs GPT-5.5#100 - 825 impressions