All comparisons
Google-Proof Q&A 65.4 87.7 HellaSwag 95.6 — HumanEval 92.7 96.7 Massive Multitask Language Understanding 88.2 — Chatbot Arena 1270.0 1412.0 SWE-bench Verified — 71.7 LiveCodeBench — 79.1 Aider Polyglot — 81.3 Massive Multi-discipline Multimodal Understanding — 82.9
Qwen2.5 72B Instruct vs o3
Side-by-side comparison of specifications, capabilities, and pricing.
| Released | 2024-06-07 | 2025-03-31 |
| Context window | 128K | 128K |
| Parameters | 72.7B | — |
| Architecture | decoder only | decoder only |
| License | Apache 2.0 | Unknown |
| Knowledge cutoff | — | — |
Capabilities | ||
| Vision | ||
| Multimodal | ||
| Reasoning | ||
| Function calling | ||
| Tool use | ||
| Structured Outputs | ||
| Code execution | ||
Availability | ||
| Providers | ||
Benchmarks