LLM Reference

Llama 3.2 90B Vision Instruct

Released
2024-09-25
Last refreshed
2026-06-01
Status
Researched 3d ago
Open SourceMultimodalLong contextVision

Llama 3.2 90B Vision Instruct is worth evaluating for long context and vision when its provider route and context window match the workload.

Use it for

  • Teams evaluating long context and vision
  • Workloads that can use a 128k context window
  • Buyers comparing 4 tracked provider routes

Do not use it for

  • Strict JSON or tool-calling flows
Specifications
Family
Llama 3.2
Released
2024-09-25
Context
128k
Parameters
88.8B
Architecture
Decoder Only
Knowledge cutoff
2024-03
Specialization
general
Training
finetuned
Created by

Large-scale open-source AI for social technologies.

Menlo Park, California, United States
Founded 2013
Website
Pricing
Output / 1M
$0.450
Input / 1M
$0.150

Cheapest of 6 routes · Bitdeer AI

About

Instruction-tuned 90B Llama 3.2 Vision model for higher-capability image reasoning, visual question answering, visual grounding, and captioning. NVIDIA NIM lists text plus image input, text output, and a 128K context window for the Llama 3.2 Vision collection.

Llama 3.2 90B Vision Instruct is Meta's high-capacity multimodal model released in September 2024 as the larger sibling to the 11B Vision variant. With 90 billion parameters, it delivers substantially higher accuracy on visually complex tasks: scientific figure analysis, detailed document parsing, visual grounding, multi-image reasoning, and fine-grained image description. Like the 11B variant, it accepts text and image inputs and produces text output, with a 128,000-token context window shared across both modalities. NVIDIA NIM lists the 90B Vision model as part of the Llama 3.2 Vision collection.

The 90B scale is appropriate for organizations that require near-frontier visual reasoning quality from open weights, particularly for tasks where image understanding accuracy is critical and compute cost is secondary. It outperforms the 11B Vision variant on complex visual question answering and on tasks requiring precise reading of text within images, scientific diagrams, or multi-page documents.

Llama 3.2 90B Vision Instruct is available as open weights under Meta's Llama Community License and is hosted on Fireworks AI, NVIDIA NIM, AWS Bedrock, Azure AI Foundry, and Bitdeer. Its parameter count requires significant inference infrastructure—typically multiple A100 or H100 GPUs for serving—making it less suitable for edge or resource-constrained deployments compared to the 11B variant.

Llama 3.2 90B Vision Instruct has a 128k-token context window.

Llama 3.2 90B Vision Instruct input tokens at $0.15/1M, output at $0.45/1M.

Top use-case fit

Long context

Included by capability and metadata signals in the decision map.

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare all 6

Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.

ProviderInput / 1MOutput / 1MRoute
Bitdeer AI$0.150$0.450
Serverless
Vercel AI Gateway$0.720$0.720
Serverless
Fireworks AI$0.900$0.900
Serverless
AWS Bedrock$1.35$1.80
Serverless

Capabilities

VisionMultimodal

Benchmark peer barsfor Long context

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.

Rankings & picks(8)