Vision/multimodal leaders rank on standard MMMU, use MathVista only as a comparable near-tie signal, then recency. MMMU Pro is tracked separately as harder multimodal evidence, but models without standard MMMU stay benchmark-pending for this leaderboard.

Eligibility — Models flagged `vision` or `multimodal` in seed data, excluding audio, speech, image-generation, and video-generation specialist models.
Primary ranking — Standard MMMU score is the primary score. When two scored models are within 1 point and both have MathVista, MathVista breaks the near-tie before release recency.
Benchmark pending — Recent source-backed vision models without tracked standard MMMU stay visible in a separate benchmark-pending section. MMMU Pro rows remain informational unless the methodology is changed to accept that harder benchmark as a proxy.
Variant collapse — We keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell.
Pricing — Multimodal pricing often differs by modality — use provider rows for image/video-specific tiers.

MMMU MMMU Pro MathVista

#	Model	MMMU	Context	Input $/1M	Output $/1M
1	Qwen3.6-Plus VisionTools MMMU: 86%	86%	1m	$0.33	$1.95
2	ByteDance Doubao Seed 2.0 Pro VisionTools MMMU: 85.4%	85.4%	256k	$0.47	$2.37
3	Qwen3.5-397B-A17B ReasoningVisionTools MMMU: 85%	85%	262k	$0.39	$2.34
4	Gemini 3.5 Flash ReasoningVisionTools MMMU: 83.6%	83.6%	1.05m	$1.50	$9.00
5	Claude Sonnet 4.6 ReasoningVisionTools MMMU: 83.6%	83.6%	1m	$3.00	$15.00
6	o3 ReasoningVisionTools MMMU: 82.9%	82.9%	200k	$2.00	$8.00
7	GPT-5.4 ReasoningVisionTools MMMU: 82.1%	82.1%	1.05m	$2.50	$15.00
8	Qwen3.6 Max Preview PreviewReasoningVisionTools MMMU: 82%	82%	256k	$1.04	$6.24
9	Gemini 2.5 Pro ReasoningVisionTools MMMU: 81.7%	81.7%	1m	$1.25	$10.00
10	Gemini 3 Pro VisionTools MMMU: 81%	81%	1m	$1.25	$5.00
11	Claude Opus 4.5 ReasoningVisionTools MMMU: 80.7%	80.7%	200k	$5.00	$25.00
12	Gemini 2.5 Flash VisionTools MMMU: 79.7%	79.7%	1m	$0.30	$2.50
13	Gemini 2.5 Pro Preview 05-06 PreviewVision MMMU: 79.6%	79.6%	1m	$1.25	$10.00
14	Claude Sonnet 4.5 ReasoningVisionTools MMMU: 77.8%	77.8%	200k	$3.00	$15.00
15	Claude Opus 4.6 ReasoningVisionTools MMMU: 76.5%	76.5%	1m	$5.00	$25.00
16	Command A+ ReasoningVisionTools MMMU: 75.1%	75.1%	128k	—	—
17	Claude 3.7 Sonnet ReasoningVisionTools MMMU: 75%	75%	200k	$3.00	$15.00
18	Llama 4 Maverick 17B Instruct FP8 Vision MMMU: 73.4%	73.4%	1m	$0.15	$0.60
19	Llama 4 Scout 17B-16E Instruct Vision MMMU: 69.4%	69.4%	10m	$0.08	$0.22
20	GPT-4o VisionTools MMMU: 69.1%	69.1%	128k	$2.50	$10.00

New models awaiting benchmark coverage

These source-backed rows qualify for this task page, but they are not scored leaderboard picks until the category benchmark data exists.

Model	Why it is listed	Status	Tracked price
Claude Sonnet 5 ToolsCode execution	Claude Sonnet 5 reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands.	Benchmark pending No tracked standard MMMU score yet.	In $2.00 / Out $10.00
Seed 2.1 Pro	Seed 2.1 Pro reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands.	Benchmark pending No tracked standard MMMU score yet.	In $0.83 / Out $4.14
Seed 2.1 Turbo	Seed 2.1 Turbo reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands.	Benchmark pending No tracked standard MMMU score yet.	In $0.41 / Out $2.07
Fugu Ultra Tools	Fugu Ultra reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands.	Benchmark pending No tracked standard MMMU score yet.	In $5.00 / Out $30.00
Kimi K2.7-Code HighSpeed Tools	Kimi K2.7-Code HighSpeed reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands.	Benchmark pending No tracked standard MMMU score yet.	In $1.90 / Out $8.00
Claude Opus 4.8 ToolsCode execution	Claude Opus 4.8 reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands.	Benchmark pending No tracked standard MMMU score yet.	In $5.00 / Out $25.00
GPT-5.5 ToolsCode execution	GPT-5.5 reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands.	Benchmark pending No tracked standard MMMU score yet.	In $5.00 / Out $30.00
Claude Opus 4.7 ToolsCode execution	Claude Opus 4.7 reports source-backed vision or multimodal capability; keep it separate from the scored vision ranking until standard MMMU data lands.	Benchmark pending No tracked standard MMMU score yet.	In $5.00 / Out $25.00

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

#4Claude Sonnet 4.6
Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.
83.6%
MMMU
#5o3
OpenAI o3 reasoning model with advanced multi-step problem-solving capabilities.
82.9%
MMMU
#6GPT-5.4
GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.
82.1%
MMMU

Compare Top Picks

Side-by-side comparison of the top picks by price, benchmark, and API access.

Qwen3.6-Plus vs ByteDance Doubao Seed 2.0 Pro Qwen3.6-Plus vs Qwen3.5-397B-A17B Qwen3.6-Plus vs Gemini 3.5 Flash Qwen3.6-Plus vs Claude Sonnet 4.6 ByteDance Doubao Seed 2.0 Pro vs Qwen3.5-397B-A17B ByteDance Doubao Seed 2.0 Pro vs Gemini 3.5 Flash

Browse Other Categories

Best LLMs for Code Generation Best LLMs for RAG Best AI Agent Models 2026: SWE-bench Ranked Best LLMs for Classification Best Open Source LLMs Best LLM for Translation in 2026 Best AI Image Models in 2026 Best AI Video Models in 2026 Best LLMs for Reasoning & Math Best Small Language Models (SLMs)Best LLMs for Function Calling & Tool Use Cheapest LLM APIs You Can Call Right Now Best Long Context LLMs Best Mainstream LLM APIs, Ranked Best LLMs for Enterprise Best Free LLMs You Can Use Right Now Best LLMs for Writing Best LLMs for Marketing Best LLMs for Customer Support

Frequently asked questions

Which LLM is best for vision?

Qwen3.6-Plus is the current LLMReference top pick for vision. The verdict uses the stored category signal MMMU: 86%. Output pricing starts at $1.95 per 1M tokens. Review the linked model and provider pages before production use because availability and pricing can change.

How does Qwen3.6-Plus compare to Qwen3.5-397B-A17B for vision?

Qwen3.6-Plus leads Qwen3.5-397B-A17B in the visible shortlist on MMMU: 86% versus 85%. The pricing cards show Qwen3.6-Plus: output pricing starts at $1.95 per 1m tokens and Qwen3.5-397B-A17B: output pricing starts at $2.34 per 1m tokens.

How does LLMReference rank LLMs for vision?

LLMReference ranks LLMs for vision from stored model, benchmark, freshness, and pricing data. The current methodology summary is: Vision/multimodal leaders rank on standard MMMU, use MathVista only as a comparable near-tie signal, then recency. MMMU Pro is tracked separately as harder multimodal evidence, but models without standard MMMU stay benchmark-pending for this leaderboard.

How often is this list updated?

The LLM rankings on this page are updated daily as new benchmark scores, provider availability, and pricing data are tracked. The "as of" date at the top of the page shows the most recent refresh.

How do you decide which models appear in the top 3?

The podium picks are driven by the primary benchmark signal for this category (shown in the Methodology section), filtered to non-deprecated models with confirmed API availability. In ties, we prefer the more recently released model.

Are preview or beta models included?

Preview models appear in the "Watch list" section but are not in the main ranked podium unless the category explicitly allows it (e.g., /best/coding and /best/agents, where preview models often lead benchmarks).

Can I compare two specific models head-to-head?

Yes — use the Compare tool at llmreference.com/compare for a side-by-side breakdown of context window, pricing, benchmarks, and provider availability.

Is the pricing data real-time?

Pricing is tracked from provider documentation and updated regularly. It reflects the best available public data, not live API quotes — always verify before billing.