Qwen3.6-Plus
- MMMU
- 86%
- Output (from)
- $1.95 / 1M
Last refreshed 2026-06-30. Next refresh: weekly.
Best vision and multimodal LLMs in 2026, ranked by image benchmarks. Covers image QA, document understanding, and video analysis.
Verdict
Qwen3.5-397B-A17B is the runner-up, 1 point back on MMMU.
Vision/multimodal leaders rank on standard MMMU, use MathVista only as a comparable near-tie signal, then recency. MMMU Pro is tracked separately as harder multimodal evidence, but models without standard MMMU stay benchmark-pending for this leaderboard.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Qwen3.6-Plus VisionTools MMMU: 86% | $0.33 | $1.95 | |
| 2 | ByteDance Doubao Seed 2.0 Pro VisionTools MMMU: 85.4% | $0.47 | $2.37 | |
| 3 | Qwen3.5-397B-A17B ReasoningVisionTools MMMU: 85% | $0.39 | $2.34 | |
| 4 | Gemini 3.5 Flash ReasoningVisionTools MMMU: 83.6% | $1.50 | $9.00 | |
| 5 | Claude Sonnet 4.6 ReasoningVisionTools MMMU: 83.6% | $3.00 | $15.00 | |
| 6 | o3 ReasoningVisionTools MMMU: 82.9% | $2.00 | $8.00 | |
| 7 | GPT-5.4 ReasoningVisionTools MMMU: 82.1% | $2.50 | $15.00 | |
| 8 | Qwen3.6 Max Preview PreviewReasoningVisionTools MMMU: 82% | $1.04 | $6.24 | |
| 9 | Gemini 2.5 Pro ReasoningVisionTools MMMU: 81.7% | $1.25 | $10.00 | |
| 10 | Gemini 3 Pro VisionTools MMMU: 81% | $1.25 | $5.00 | |
| 11 | Claude Opus 4.5 ReasoningVisionTools MMMU: 80.7% | $5.00 | $25.00 | |
| 12 | Gemini 2.5 Flash VisionTools MMMU: 79.7% | $0.30 | $2.50 | |
| 13 | Gemini 2.5 Pro Preview 05-06 PreviewVision MMMU: 79.6% | $1.25 | $10.00 | |
| 14 | Claude Sonnet 4.5 ReasoningVisionTools MMMU: 77.8% | $3.00 | $15.00 | |
| 15 | Claude Opus 4.6 ReasoningVisionTools MMMU: 76.5% | $5.00 | $25.00 | |
| 16 | Command A+ ReasoningVisionTools MMMU: 75.1% | — | — | |
| 17 | Claude 3.7 Sonnet ReasoningVisionTools MMMU: 75% | $3.00 | $15.00 | |
| 18 | Llama 4 Maverick 17B Instruct FP8 Vision MMMU: 73.4% | $0.15 | $0.60 | |
| 19 | Llama 4 Scout 17B-16E Instruct Vision MMMU: 69.4% | $0.08 | $0.22 | |
| 20 | GPT-4o VisionTools MMMU: 69.1% | $2.50 | $10.00 |
These source-backed rows qualify for this task page, but they are not scored leaderboard picks until the category benchmark data exists.
| Model | Why it is listed | Status | Tracked price |
|---|---|---|---|
| Claude Sonnet 5 ToolsCode execution | Claude Sonnet 5 reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands. | Benchmark pending No tracked standard MMMU score yet. | In $2.00 / Out $10.00 |
| Seed 2.1 Pro | Seed 2.1 Pro reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands. | Benchmark pending No tracked standard MMMU score yet. | In $0.83 / Out $4.14 |
| Seed 2.1 Turbo | Seed 2.1 Turbo reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands. | Benchmark pending No tracked standard MMMU score yet. | In $0.41 / Out $2.07 |
| Fugu Ultra Tools | Fugu Ultra reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands. | Benchmark pending No tracked standard MMMU score yet. | In $5.00 / Out $30.00 |
| Kimi K2.7-Code HighSpeed Tools | Kimi K2.7-Code HighSpeed reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands. | Benchmark pending No tracked standard MMMU score yet. | In $1.90 / Out $8.00 |
| Claude Opus 4.8 ToolsCode execution | Claude Opus 4.8 reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands. | Benchmark pending No tracked standard MMMU score yet. | In $5.00 / Out $25.00 |
| GPT-5.5 ToolsCode execution | GPT-5.5 reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands. | Benchmark pending No tracked standard MMMU score yet. | In $5.00 / Out $30.00 |
| Claude Opus 4.7 ToolsCode execution | Claude Opus 4.7 reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands. | Benchmark pending No tracked standard MMMU score yet. | In $5.00 / Out $25.00 |
Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.
Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.
83.6%
MMMU
OpenAI o3 reasoning model with advanced multi-step problem-solving capabilities.
82.9%
MMMU
GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.
82.1%
MMMU
Side-by-side comparison of the top picks by price, benchmark, and API access.
Qwen3.6-Plus is the current LLMReference top pick for vision. The verdict uses the stored category signal MMMU: 86%. Output pricing starts at $1.95 per 1M tokens. Review the linked model and provider pages before production use because availability and pricing can change.
Qwen3.6-Plus leads Qwen3.5-397B-A17B in the visible shortlist on MMMU: 86% versus 85%. The pricing cards show Qwen3.6-Plus: output pricing starts at $1.95 per 1m tokens and Qwen3.5-397B-A17B: output pricing starts at $2.34 per 1m tokens.
LLMReference ranks LLMs for vision from stored model, benchmark, freshness, and pricing data. The current methodology summary is: Vision/multimodal leaders rank on standard MMMU, use MathVista only as a comparable near-tie signal, then recency. MMMU Pro is tracked separately as harder multimodal evidence, but models without standard MMMU stay benchmark-pending for this leaderboard.
The LLM rankings on this page are updated daily as new benchmark scores, provider availability, and pricing data are tracked. The "as of" date at the top of the page shows the most recent refresh.
The podium picks are driven by the primary benchmark signal for this category (shown in the Methodology section), filtered to non-deprecated models with confirmed API availability. In ties, we prefer the more recently released model.
Preview models appear in the "Watch list" section but are not in the main ranked podium unless the category explicitly allows it (e.g., /best/coding and /best/agents, where preview models often lead benchmarks).
Yes — use the Compare tool at llmreference.com/compare for a side-by-side breakdown of context window, pricing, benchmarks, and provider availability.
Pricing is tracked from provider documentation and updated regularly. It reflects the best available public data, not live API quotes — always verify before billing.