Llama 3.2 90B Vision Instruct

Name: Llama 3.2 90B Vision Instruct
Author: AI at Meta

Released

2024-09-25

Last refreshed

2026-07-09

Status

Researched 47d ago

Open weightsCommercial use: conditionalMultimodalLong contextVision

Llama 3.2 90B Vision Instruct is worth evaluating for long context and vision when its provider route and context window match the workload.

Use it for

Teams evaluating long context and vision
Workloads that can use a 128k context window
Buyers comparing 4 tracked provider routes

Do not use it for

Strict JSON or tool-calling flows

Specifications

Family: Llama 3.2
Released: 2024-09-25
Context: 128k
Parameters: 88.8B
Architecture: Decoder Only
Knowledge cutoff: 2024-03
Specialization: general
Openness: Open weights
License: Llama 3 CommunityCommercial use: conditional
Weights: Unknown
Code: Unknown
Training: Fine-tuned

Created by

AI at Meta

Large-scale open-source AI for social technologies.

Menlo Park, California, United States

Founded 2013

Website

Pricing

Output / 1M

$0.450

Input / 1M

$0.150

Cheapest of 6 routes · Bitdeer AI

Providers(6)

Fireworks AI NVIDIA NIM Bitdeer AI AWS Bedrock Microsoft Foundry Vercel AI Gateway

View 6 provider routes

About

Instruction-tuned 90B Llama 3.2 Vision model for higher-capability image reasoning, visual question answering, visual grounding, and captioning. NVIDIA NIM lists text plus image input, text output, and a 128K context window for the Llama 3.2 Vision collection.

Llama 3.2 90B Vision Instruct is Meta's high-capacity multimodal model released in September 2024 as the larger sibling to the 11B Vision variant. With 90 billion parameters, it delivers substantially higher accuracy on visually complex tasks: scientific figure analysis, detailed document parsing, visual grounding, multi-image reasoning, and fine-grained image description. Like the 11B variant, it accepts text and image inputs and produces text output, with a 128,000-token context window shared across both modalities. NVIDIA NIM lists the 90B Vision model as part of the Llama 3.2 Vision collection.

The 90B scale is appropriate for organizations that require near-frontier visual reasoning quality from open weights, particularly for tasks where image understanding accuracy is critical and compute cost is secondary. It outperforms the 11B Vision variant on complex visual question answering and on tasks requiring precise reading of text within images, scientific diagrams, or multi-page documents.

Llama 3.2 90B Vision Instruct is available as open weights under Meta's Llama Community License and is hosted on Fireworks AI, NVIDIA NIM, AWS Bedrock, Azure AI Foundry, and Bitdeer. Its parameter count requires significant inference infrastructure—typically multiple A100 or H100 GPUs for serving—making it less suitable for edge or resource-constrained deployments compared to the 11B variant.

Llama 3.2 90B Vision Instruct has a 128k-token context window.

Llama 3.2 90B Vision Instruct input tokens at $0.15/1M, output at $0.45/1M.

Top use-case fit

Long context

Included by capability and metadata signals in the decision map.

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare all 6

Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.

Provider	Input / 1M	Output / 1M	Route
Bitdeer AI	$0.150	$0.450	Serverless
Vercel AI Gateway	$0.720	$0.720	Serverless
Fireworks AI	$0.900	$0.900	Serverless
AWS Bedrock	$1.35	$1.80	Serverless

Available via routers & gateways(8)

LiteLLM

Gateway

Open-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.

Free OSSMicrosoft Foundry

OpenRouter

Hybrid

Unified hybrid gateway to 400+ models from 60+ providers via a single OpenAI-compatible API, with optional auto-routing that selects the best model per prompt.

PassthroughFireworks AI

Portkey

Gateway

Production AI gateway routing to 1,600+ LLMs with failover, load balancing, semantic caching, and guardrails; Apache 2.0 core is fully self-hostable with the complete feature set.

SubscriptionMicrosoft Foundry

Amazon Bedrock Intelligent Prompt Routing

Router

AWS Bedrock's native intelligent prompt router that routes prompts between Anthropic Claude model tiers (Haiku/Sonnet) based on predicted task complexity, with no extra per-routing charge.

PassthroughAWS Bedrock

Azure AI Foundry Model Router

Router

Microsoft Azure AI Foundry's native model router that uses a trained ML model to route each prompt in real time to the optimal Azure-hosted model, with Balanced/Cost/Quality mode selection and automatic failover.

PassthroughMicrosoft Foundry

Helicone

Gateway

Observability-first AI gateway with routing, caching, rate limiting, and request tracing; Apache 2.0 open-source core with a managed hosted tier for logging and analytics.

SubscriptionMicrosoft Foundry