llmreference

Compare AI models

Start with two models, inspect the tradeoff, then open a verdict-first detail page with pricing, benchmark, capability, and provider evidence.

Sitemap coverage 3614+ pairs

Decision builder

Pick the pair before opening the detail page

215 selectable models
Open comparison

Claude Opus 4.7 vs Kimi K2.6

Kimi K2.6 is ~567% cheaper at $0.75/1M; pay for Claude Opus 4.7 only for coding workflow support.

614% gap
Output price
$25.00 / $3.50
Context
1M / 262K
Benchmarks
4 shared
Providers
6 / 5

Popular pairs

Browse comparisons with a decision signal attached

DeepSeek V4 Pro vs GLM-5.1

DeepSeek V4 Pro is ~141% cheaper at $0.43/1M; pay for GLM-5.1 only for coding workflow support.

302% gap2 benchmarks
Output price
$0.870 / $3.50
Context
1M / 200k
Benchmarks
2 shared
Providers
3 / 3
CodingRAGAgentsLong contextDeepSeek V4 Pro leads Google-Proof Q&A

DeepSeek V4 Pro vs Kimi K2.6

DeepSeek V4 Pro is ~72% cheaper at $0.43/1M; pay for Kimi K2.6 only for coding workflow support.

302% gap7 benchmarks
Output price
$0.870 / $3.50
Context
1M / 262K
Benchmarks
7 shared
Providers
3 / 5
CodingRAGAgentsLong contextDeepSeek V4 Pro leads MMLU PRO

Claude Sonnet 4.6 vs DeepSeek V4 Flash

DeepSeek V4 Flash is ~2043% cheaper at $0.14/1M; pay for Claude Sonnet 4.6 only for coding workflow support.

5257% gap3 benchmarks
Output price
$15.00 / $0.280
Context
1M / 1M
Benchmarks
3 shared
Providers
5 / 3
CodingRAGAgentsLong contextClaude Sonnet 4.6 leads MMLU PRO

Gemini 2.5 Pro vs Grok 4

Grok 4 is safer overall; choose Gemini 2.5 Pro when coding workflow support matters.

300% gap2 benchmarks
Output price
$10.00 / $2.50
Context
1M / 256k
Benchmarks
2 shared
Providers
3 / 4
CodingRAGAgentsLong contextGrok 4 leads MMLU PRO

DeepSeek V4 Flash vs GLM-5.1

DeepSeek V4 Flash is ~650% cheaper at $0.14/1M; pay for GLM-5.1 only for coding workflow support.

1150% gap2 benchmarks
Output price
$0.280 / $3.50
Context
1M / 200k
Benchmarks
2 shared
Providers
3 / 3
CodingRAGAgentsLong contextDeepSeek V4 Flash leads Google-Proof Q&A

Claude Sonnet 4.6 vs Kimi K2.6

Kimi K2.6 is ~300% cheaper at $0.75/1M; pay for Claude Sonnet 4.6 only for coding workflow support.

329% gap4 benchmarks
Output price
$15.00 / $3.50
Context
1M / 262K
Benchmarks
4 shared
Providers
5 / 5
CodingRAGAgentsLong contextClaude Sonnet 4.6 leads MMLU PRO

DeepSeek V4 Flash vs Grok 4

DeepSeek V4 Flash is ~793% cheaper at $0.14/1M; pay for Grok 4 only for coding workflow support.

793% gap2 benchmarks
Output price
$0.280 / $2.50
Context
1M / 256k
Benchmarks
2 shared
Providers
3 / 4
CodingRAGAgentsLong contextGrok 4 leads MMLU PRO

Qwen3.6-27B vs Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is ~113% cheaper at $0.15/1M; pay for Qwen3.6-27B only for coding workflow support.

220% gap3 benchmarks
Output price
$3.20 / $1.00
Context
262K / 262K
Benchmarks
3 shared
Providers
2 / 1
CodingRAGAgentsLong contextQwen3.6-27B leads MMLU PRO

GLM-5 vs GLM-5.1

GLM-5 is ~75% cheaper at $0.6/1M; pay for GLM-5.1 only for coding workflow support.

68% gap1 benchmarks
Output price
$2.08 / $3.50
Context
200k / 200k
Benchmarks
1 shared
Providers
5 / 3
CodingRAGAgentsLong contextGLM-5.1 leads SWE-bench Pro

Claude Opus 4.7 vs Kimi K2.6

Kimi K2.6 is ~567% cheaper at $0.75/1M; pay for Claude Opus 4.7 only for coding workflow support.

614% gap4 benchmarks
Output price
$25.00 / $3.50
Context
1M / 262K
Benchmarks
4 shared
Providers
6 / 5
CodingRAGAgentsLong contextClaude Opus 4.7 leads SWE-bench Verified

Llama 3 70B Instruct vs Llama 3.1 70B Instruct

Pick Llama 3.1 70B Instruct for coding; Llama 3 70B Instruct is better when provider fit matters more.

0% gap2 benchmarks
Output price
$0.400 / $0.400
Context
8K / 128K
Benchmarks
2 shared
Providers
17 / 11
CodingClassificationJSON / Tool useRAGLlama 3.1 70B Instruct leads HumanEval

DeepSeek V4 Flash vs Kimi K2.6

DeepSeek V4 Flash is ~436% cheaper at $0.14/1M; pay for Kimi K2.6 only for coding workflow support.

1150% gap5 benchmarks
Output price
$0.280 / $3.50
Context
1M / 262K
Benchmarks
5 shared
Providers
3 / 5
CodingRAGAgentsLong contextDeepSeek V4 Flash leads MMLU PRO

DeepSeek V4 Flash vs Qwen3.6-27B

DeepSeek V4 Flash is ~129% cheaper at $0.14/1M; pay for Qwen3.6-27B only for coding workflow support.

1043% gap3 benchmarks
Output price
$0.280 / $3.20
Context
1M / 262K
Benchmarks
3 shared
Providers
3 / 2
CodingRAGAgentsLong contextDeepSeek V4 Flash leads MMLU PRO

Claude Sonnet 4.6 vs GPT-5.5 Pro

Claude Sonnet 4.6 is ~900% cheaper at $3/1M; pay for GPT-5.5 Pro only for coding workflow support.

1100% gap2 benchmarks
Output price
$15.00 / $180.00
Context
1M / 1.1M
Benchmarks
2 shared
Providers
5 / 2
CodingRAGAgentsLong contextGPT-5.5 Pro leads SWE-bench Verified

Gemini 2.5 Flash vs Grok 4

Gemini 2.5 Flash is ~317% cheaper at $0.3/1M; pay for Grok 4 only for coding workflow support.

0% gap2 benchmarks
Output price
$2.50 / $2.50
Context
1M / 256k
Benchmarks
2 shared
Providers
4 / 4
CodingRAGAgentsLong contextGrok 4 leads MMLU PRO

Gemini 2.5 Pro vs o3

Gemini 2.5 Pro is ~60% cheaper at $1.25/1M; pay for o3 only for coding workflow support.

25% gap5 benchmarks
Output price
$10.00 / $8.00
Context
1M / 200K
Benchmarks
5 shared
Providers
3 / 2
CodingRAGAgentsLong contexto3 leads SWE-bench Verified

DeepSeek V3.1 vs DeepSeek V4 Pro

DeepSeek V4 Pro fits 16x more tokens; pick it for long-context work and DeepSeek V3.1 for tighter calls.

93% gap2 benchmarks
Output price
$1.68 / $0.870
Context
64K / 1M
Benchmarks
2 shared
Providers
6 / 3
CodingAgentsVisionClassificationDeepSeek V4 Pro leads MMLU PRO

Grok-3 vs Grok 4

Grok-3 is ~56% cheaper at $0.8/1M; pay for Grok 4 only for coding workflow support.

4% gap2 benchmarks
Output price
$2.40 / $2.50
Context
1M / 256k
Benchmarks
2 shared
Providers
4 / 4
CodingRAGAgentsLong contextGrok 4 leads MMLU PRO