Llama 3.2 90B Vision Instruct
Llama 3.2 90B Vision Instruct is worth evaluating for long context and vision when its provider route and context window match the workload.
Use it for
- Teams evaluating long context and vision
- Workloads that can use a 128k context window
- Buyers comparing 4 tracked provider routes
Do not use it for
- Strict JSON or tool-calling flows
- Family
- Llama 3.2
- Released
- 2024-09-25
- Context
- 128k
- Parameters
- 88.8B
- Architecture
- Decoder Only
- Knowledge cutoff
- 2024-03
- Specialization
- general
- Training
- finetuned
Large-scale open-source AI for social technologies.
Cheapest of 6 routes · Bitdeer AI
About
Instruction-tuned 90B Llama 3.2 Vision model for higher-capability image reasoning, visual question answering, visual grounding, and captioning. NVIDIA NIM lists text plus image input, text output, and a 128K context window for the Llama 3.2 Vision collection.
Llama 3.2 90B Vision Instruct is Meta's high-capacity multimodal model released in September 2024 as the larger sibling to the 11B Vision variant. With 90 billion parameters, it delivers substantially higher accuracy on visually complex tasks: scientific figure analysis, detailed document parsing, visual grounding, multi-image reasoning, and fine-grained image description. Like the 11B variant, it accepts text and image inputs and produces text output, with a 128,000-token context window shared across both modalities. NVIDIA NIM lists the 90B Vision model as part of the Llama 3.2 Vision collection.
The 90B scale is appropriate for organizations that require near-frontier visual reasoning quality from open weights, particularly for tasks where image understanding accuracy is critical and compute cost is secondary. It outperforms the 11B Vision variant on complex visual question answering and on tasks requiring precise reading of text within images, scientific diagrams, or multi-page documents.
Llama 3.2 90B Vision Instruct is available as open weights under Meta's Llama Community License and is hosted on Fireworks AI, NVIDIA NIM, AWS Bedrock, Azure AI Foundry, and Bitdeer. Its parameter count requires significant inference infrastructure—typically multiple A100 or H100 GPUs for serving—making it less suitable for edge or resource-constrained deployments compared to the 11B variant.
Llama 3.2 90B Vision Instruct has a 128k-token context window.
Llama 3.2 90B Vision Instruct input tokens at $0.15/1M, output at $0.45/1M.
Top use-case fit
Long context
Included by capability and metadata signals in the decision map.
Vision
Included by capability and metadata signals in the decision map.
Provider price ladder
Compare all 6Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.
| Provider | Input / 1M | Output / 1M | Route |
|---|---|---|---|
| Bitdeer AI | $0.150 | $0.450 | Serverless |
| Vercel AI Gateway | $0.720 | $0.720 | Serverless |
| Fireworks AI | $0.900 | $0.900 | Serverless |
| AWS Bedrock | $1.35 | $1.80 | Serverless |
Capabilities
Benchmark peer barsfor Long context
No task-mapped benchmark peers are available for this model yet.
Migration checks
No linked migration route is available for this model yet.