GPT-5.5
- SWE-bench Verified
- 88.7%
- Output (from)
- $30.00 / 1M
Last refreshed 2026-05-18. Next refresh: weekly.
Compare coding-capable models by sourced software-engineering benchmarks, context window, provider coverage, and tracked token pricing.
Opinionated short stack for this category — scroll for the full leaderboard, pricing, and compare links.
Coding leaders are ordered on shipped coding-agent evidence first, then classic code generation scores, with recency as the last tie-break.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | GPT-5.5 ReasoningVisionTools SWE-bench Verified: 88.7% | $5.00 | $30.00 | |
| 2 | GPT-5.5 Pro ReasoningVisionTools SWE-bench Verified: 88.7% | $30.00 | $180.00 | |
| 3 | Claude Opus 4.7 ReasoningVisionTools SWE-bench Verified: 87.6% | $5.00 | $25.00 | |
| 4 | GPT-5.3-Codex ReasoningVisionTools SWE-bench Verified: 85% | $1.75 | $14.00 | |
| 5 | Claude Opus 4.5 ReasoningVisionTools SWE-bench Verified: 80.9% | $5.00 | $25.00 | |
| 6 | Claude Opus 4.6 ReasoningVisionTools SWE-bench Verified: 80.8% | $5.00 | $25.00 | |
| 7 | DeepSeek V4 Pro ReasoningTools SWE-bench Verified: 80.6% | $0.43 | $0.87 | |
| 8 | Gemini 3.1 Pro Preview PreviewVisionTools SWE-bench Verified: 80.6% | $2.00 | $12.00 | |
| 9 | Kimi K2.6 ReasoningVisionTools SWE-bench Verified: 80.2% | $0.75 | $3.50 | |
| 10 | GPT-5.2 ReasoningVisionTools SWE-bench Verified: 80% | $1.75 | $14.00 | |
| 11 | Claude Sonnet 4.6 ReasoningVisionTools SWE-bench Verified: 79.6% | $3.00 | $15.00 | |
| 12 | DeepSeek V4 Flash ReasoningTools SWE-bench Verified: 79% | $0.14 | $0.28 | |
| 13 | Xiaomi MiMo-V2.5-Pro Tools SWE-bench Verified: 78.9% | $1.00 | $3.00 | |
| 14 | Qwen3.6-Plus VisionTools SWE-bench Verified: 78.8% | $0.33 | $1.95 | |
| 15 | Qwen3-Max VisionTools SWE-bench Verified: 78.8% | $0.78 | $3.90 | |
| 16 | GLM-5 ReasoningTools SWE-bench Verified: 77.8% | $0.60 | $2.08 | |
| 17 | Mistral Medium 3.5 ReasoningVisionTools SWE-bench Verified: 77.6% | $1.50 | $7.50 | |
| 18 | Muse Spark ReasoningVisionTools SWE-bench Verified: 77.4% | — | — | |
| 19 | Qwen3.6-27B ReasoningVisionTools SWE-bench Verified: 77.2% | $0.32 | $3.20 | |
| 20 | Grok 4.20 ReasoningTools SWE-bench Verified: 76.7% | $1.25 | $2.50 |
Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.
Most capable agentic coding model from OpenAI. Optimized for long-horizon, agentic coding tasks in the Codex CLI and API. Note: GPT-5.3-Codex-Spark is a distinct ChatGPT Pro research preview (not API-accessible).
85%
SWE-bench Verified
Claude Opus 4.5 available on AWS Bedrock
80.9%
SWE-bench Verified
Claude Opus 4.6 available on AWS Bedrock
80.8%
SWE-bench Verified