Best Multimodal LLMs for Vision (2026)
Top AI models that understand images, video, and documents. Compare GPT-4o, Gemini, Claude, and other multimodal models by capabilities and pricing.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Mistral Small 4 VisionTools | — | — | |
| 2 | Nemotron 3 VoiceChat Vision | — | — | |
| 3 | GPT-4.5 VisionTools | — | — | |
| 4 | Phi-4 Reasoning Vision 15B | — | — | |
| 5 | Gemini 3.1 Flash-Lite Preview VisionTools | $0.25 | $1.5 | |
| 6 | DeepSeek V3.1 Vision | $0.56 | $1.68 | |
| 7 | Transcribe (03-2026) | — | — | |
| 8 | Gemini 3.1 Pro Preview VisionTools | $2 | $12 | |
| 9 | Gemini 3.1 Pro VisionTools | $2 | $12 | |
| 10 | Claude Sonnet 4.6 ReasoningVisionTools | $3 | $15 | |
| 11 | Claude Sonnet 4.6 Batch ReasoningVisionTools | $1.5 | $7.5 | |
| 12 | Gemini 2.0 Ultra VisionTools | — | — | |
| 13 | Claude Opus 4.6 ReasoningVisionTools | $5 | $25 | |
| 14 | Claude Opus 4.6 Batch ReasoningVisionTools | $2.5 | $12.5 | |
| 15 | Qwen 3 Max VisionTools | $0.78 | $3.9 | |
| 16 | Grok-3 ReasoningVisionTools | $3 | $15 | |
| 17 | Gemini 3 Flash Preview VisionTools | $0.5 | $3 | |
| 18 | Gemini 3 Flash VisionTools | $0.5 | $3 | |
| 19 | Mistral Small 3.1 24B Instruct Vision | $0.03 | $0.11 | |
| 20 | Gemini 3 Pro VisionTools | $2 | $12 |