Compare AI models

Start with two models, inspect the tradeoff, then open a verdict-first detail page with pricing, benchmark, capability, and provider evidence.

Sitemap coverage 3614+ pairs

Decision builder

Pick the pair before opening the detail page

215 selectable models

Model AModel BOpen comparison

Claude Opus 4.7 vs Kimi K2.6

Kimi K2.6 is ~567% cheaper at $0.75/1M; pay for Claude Opus 4.7 only for coding workflow support.

614% gap

Output price: $25.00 / $3.50
Context: 1M / 262K
Benchmarks: 4 shared
Providers: 6 / 5

Popular pairs

Browse comparisons with a decision signal attached

DeepSeek V4 Pro vs GLM-5.1

DeepSeek V4 Pro is ~141% cheaper at $0.43/1M; pay for GLM-5.1 only for coding workflow support.

302% gap2 benchmarks

Output price: $0.870 / $3.50
Context: 1M / 200k
Benchmarks: 2 shared
Providers: 3 / 3

CodingRAGAgentsLong contextDeepSeek V4 Pro leads Google-Proof Q&A

DeepSeek V4 Pro vs Kimi K2.6

DeepSeek V4 Pro is ~72% cheaper at $0.43/1M; pay for Kimi K2.6 only for coding workflow support.

302% gap7 benchmarks

Output price: $0.870 / $3.50
Context: 1M / 262K
Benchmarks: 7 shared
Providers: 3 / 5

CodingRAGAgentsLong contextDeepSeek V4 Pro leads MMLU PRO

Claude Sonnet 4.6 vs DeepSeek V4 Flash

DeepSeek V4 Flash is ~2043% cheaper at $0.14/1M; pay for Claude Sonnet 4.6 only for coding workflow support.

5257% gap3 benchmarks

Output price: $15.00 / $0.280
Context: 1M / 1M
Benchmarks: 3 shared
Providers: 5 / 3

CodingRAGAgentsLong contextClaude Sonnet 4.6 leads MMLU PRO

Gemini 2.5 Pro vs Grok 4

Grok 4 is safer overall; choose Gemini 2.5 Pro when coding workflow support matters.

300% gap2 benchmarks

Output price: $10.00 / $2.50
Context: 1M / 256k
Benchmarks: 2 shared
Providers: 3 / 4

CodingRAGAgentsLong contextGrok 4 leads MMLU PRO

DeepSeek V4 Flash vs GLM-5.1

DeepSeek V4 Flash is ~650% cheaper at $0.14/1M; pay for GLM-5.1 only for coding workflow support.

1150% gap2 benchmarks

Output price: $0.280 / $3.50
Context: 1M / 200k
Benchmarks: 2 shared
Providers: 3 / 3

CodingRAGAgentsLong contextDeepSeek V4 Flash leads Google-Proof Q&A

Claude Sonnet 4.6 vs Kimi K2.6

Kimi K2.6 is ~300% cheaper at $0.75/1M; pay for Claude Sonnet 4.6 only for coding workflow support.

329% gap4 benchmarks

Output price: $15.00 / $3.50
Context: 1M / 262K
Benchmarks: 4 shared
Providers: 5 / 5

CodingRAGAgentsLong contextClaude Sonnet 4.6 leads MMLU PRO

DeepSeek V4 Flash vs Grok 4

DeepSeek V4 Flash is ~793% cheaper at $0.14/1M; pay for Grok 4 only for coding workflow support.

793% gap2 benchmarks

Output price: $0.280 / $2.50
Context: 1M / 256k
Benchmarks: 2 shared
Providers: 3 / 4

CodingRAGAgentsLong contextGrok 4 leads MMLU PRO

Qwen3.6-27B vs Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is ~113% cheaper at $0.15/1M; pay for Qwen3.6-27B only for coding workflow support.

220% gap3 benchmarks

Output price: $3.20 / $1.00
Context: 262K / 262K
Benchmarks: 3 shared
Providers: 2 / 1

CodingRAGAgentsLong contextQwen3.6-27B leads MMLU PRO

GLM-5 vs GLM-5.1

GLM-5 is ~75% cheaper at $0.6/1M; pay for GLM-5.1 only for coding workflow support.

68% gap1 benchmarks

Output price: $2.08 / $3.50
Context: 200k / 200k
Benchmarks: 1 shared
Providers: 5 / 3

CodingRAGAgentsLong contextGLM-5.1 leads SWE-bench Pro

Claude Opus 4.7 vs Kimi K2.6

Kimi K2.6 is ~567% cheaper at $0.75/1M; pay for Claude Opus 4.7 only for coding workflow support.

614% gap4 benchmarks

Output price: $25.00 / $3.50
Context: 1M / 262K
Benchmarks: 4 shared
Providers: 6 / 5

CodingRAGAgentsLong contextClaude Opus 4.7 leads SWE-bench Verified

Llama 3 70B Instruct vs Llama 3.1 70B Instruct

Pick Llama 3.1 70B Instruct for coding; Llama 3 70B Instruct is better when provider fit matters more.

0% gap2 benchmarks

Output price: $0.400 / $0.400
Context: 8K / 128K
Benchmarks: 2 shared
Providers: 17 / 11

CodingClassificationJSON / Tool useRAGLlama 3.1 70B Instruct leads HumanEval

DeepSeek V4 Flash vs Kimi K2.6

DeepSeek V4 Flash is ~436% cheaper at $0.14/1M; pay for Kimi K2.6 only for coding workflow support.

1150% gap5 benchmarks

Output price: $0.280 / $3.50
Context: 1M / 262K
Benchmarks: 5 shared
Providers: 3 / 5

CodingRAGAgentsLong contextDeepSeek V4 Flash leads MMLU PRO

DeepSeek V4 Flash vs Qwen3.6-27B

DeepSeek V4 Flash is ~129% cheaper at $0.14/1M; pay for Qwen3.6-27B only for coding workflow support.

1043% gap3 benchmarks

Output price: $0.280 / $3.20
Context: 1M / 262K
Benchmarks: 3 shared
Providers: 3 / 2

CodingRAGAgentsLong contextDeepSeek V4 Flash leads MMLU PRO

Claude Sonnet 4.6 vs GPT-5.5 Pro

Claude Sonnet 4.6 is ~900% cheaper at $3/1M; pay for GPT-5.5 Pro only for coding workflow support.

1100% gap2 benchmarks

Output price: $15.00 / $180.00
Context: 1M / 1.1M
Benchmarks: 2 shared
Providers: 5 / 2

CodingRAGAgentsLong contextGPT-5.5 Pro leads SWE-bench Verified

Gemini 2.5 Flash vs Grok 4

Gemini 2.5 Flash is ~317% cheaper at $0.3/1M; pay for Grok 4 only for coding workflow support.

0% gap2 benchmarks

Output price: $2.50 / $2.50
Context: 1M / 256k
Benchmarks: 2 shared
Providers: 4 / 4

CodingRAGAgentsLong contextGrok 4 leads MMLU PRO

Gemini 2.5 Pro vs o3

Gemini 2.5 Pro is ~60% cheaper at $1.25/1M; pay for o3 only for coding workflow support.

25% gap5 benchmarks

Output price: $10.00 / $8.00
Context: 1M / 200K
Benchmarks: 5 shared
Providers: 3 / 2

CodingRAGAgentsLong contexto3 leads SWE-bench Verified

DeepSeek V3.1 vs DeepSeek V4 Pro

DeepSeek V4 Pro fits 16x more tokens; pick it for long-context work and DeepSeek V3.1 for tighter calls.

93% gap2 benchmarks

Output price: $1.68 / $0.870
Context: 64K / 1M
Benchmarks: 2 shared
Providers: 6 / 3

CodingAgentsVisionClassificationDeepSeek V4 Pro leads MMLU PRO

Grok-3 vs Grok 4

Grok-3 is ~56% cheaper at $0.8/1M; pay for Grok 4 only for coding workflow support.

4% gap2 benchmarks

Output price: $2.40 / $2.50
Context: 1M / 256k
Benchmarks: 2 shared
Providers: 4 / 4

CodingRAGAgentsLong contextGrok 4 leads MMLU PRO