Qwen3.6-Plus
- MMMU
- 86%
- Output (from)
- $1.95 / 1M
Last refreshed 2026-05-18. Next refresh: weekly.
Top multimodal models that understand images, video, and documents, ranked by vision benchmarks, capabilities, pricing, and context window.
Opinionated short stack for this category — scroll for the full leaderboard, pricing, and compare links.
Vision/multimodal leaders rank on MMMU (multimodal understanding), then recency.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Qwen3.6-Plus VisionTools MMMU: 86% | $0.33 | $1.95 | |
| 2 | Qwen3.5-397B-A17B ReasoningTools MMMU: 85% | $0.39 | $2.34 | |
| 3 | GPT-5.4 ReasoningTools MMMU: 82.1% | $2.50 | $15.00 | |
| 4 | Qwen3.6 Max Preview PreviewReasoningVisionTools MMMU: 82% | $1.04 | $6.24 | |
| 5 | Gemini 3 Pro VisionTools MMMU: 81% | $1.25 | $5.00 | |
| 6 | Claude Opus 4.5 ReasoningVisionTools MMMU: 80.7% | $5.00 | $25.00 | |
| 7 | Gemini 2.5 Flash VisionTools MMMU: 79.7% | $0.30 | $2.50 | |
| 8 | Claude Sonnet 4.5 ReasoningVisionTools MMMU: 77.8% | $3.00 | $15.00 | |
| 9 | Claude 3.7 Sonnet ReasoningVisionTools MMMU: 75% | $3.00 | $15.00 | |
| 10 | GPT-4o VisionTools MMMU: 69.1% | $2.50 | $10.00 | |
| 11 | Qwen2-VL-72B-Instruct Vision MMMU: 64.5% | $0.90 | $0.90 | |
| 12 | Llama 3.2 90B Vision Vision MMMU: 60.3% | $1.35 | $1.80 | |
| 13 | Llama 3.2 11B Vision Vision MMMU: 50.7% | $0.20 | $0.27 | |
| 14 | GLM-4V 9B MMMU: 48.3% | $0.05 | $0.25 | |
| 15 | Phi 3.5 Vision Instruct Vision MMMU: 43% | — | — | |
| 16 | Sora 2 MMMU: — | — | — | |
| 17 | Perceptron Mk1 ReasoningVision MMMU: — | $0.15 | $1.50 | |
| 18 | MiniCPM-V 4.6 Vision MMMU: — | — | — | |
| 19 | GPT Realtime 2 ReasoningTools MMMU: — | $32.00 | $64.00 | |
| 20 | GPT Realtime Translate MMMU: — | — | — |
Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse MoE architecture, available for preview as part of the Qwen3.6 series.
82%
MMMU
Google DeepMind's most advanced reasoning Gemini model. Part of the Gemini 3 series with frontier-class intelligence, multimodal understanding, and 1M token context window.
81%
MMMU
Claude Opus 4.5 available on AWS Bedrock
80.7%
MMMU