LLM Reference

Granite Vision Models by IBM Research

4 models2025–2026Up to 128k ctx

Details

ResearcherIBM Research
Models4
Released2025–2026
Max context128k

Capabilities

VisionAll models
MultimodalAll models

About

IBM's Granite Vision family of vision-language models designed for enterprise document understanding, OCR, chart/table extraction, and general image tasks. Includes models from the 3.2, 3.3, 4.0, and 4.1 generations. All released under Apache 2.0.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

4 in view

Use when the workload needs 4B parameters and multimodal inputs.

2026-044B parametersmultimodal inputs

Use when the workload needs 128k context, 3B parameters, and multimodal inputs.

2026-04128k context3B parametersmultimodal inputs

Use when the workload needs 128k context, 2B parameters, and multimodal inputs.

2025-06128k context2B parametersmultimodal inputs

Use when the workload needs 128k context, 2B parameters, and multimodal inputs.

2025-02128k context2B parametersmultimodal inputs

Release Timeline

3 release groups
2026-04
2 current
Granite 4.0 3B Vision
128k context3B parametersmultimodal inputs
Current
Granite Vision 4.1 4B
4B parametersmultimodal inputs
Current
2025-06
1 current
Granite Vision 3.3 2B
128k context2B parametersmultimodal inputs
Current
2025-02
1 current
Granite Vision 3.2 2B
128k context2B parametersmultimodal inputs
Current

Specifications(4 models)

Granite Vision model specifications comparison
ModelReleasedContextParametersVisionMultimodal
Granite Vision 4.1 4B2026-044BYesYes
Granite 4.0 3B Vision2026-04128k3BYesYes
Granite Vision 3.3 2B2025-06128k2BYesYes
Granite Vision 3.2 2B2025-02128k2BYesYes

Frequently Asked Questions

What is Granite Vision used for?
Granite Vision is used for vision and multimodal work, coding, and structured outputs. The family description and listed model capabilities point to those workloads as the best fit.
How does Granite Vision compare to Granite 4?
Granite Vision by IBM Research is strongest where you need vision and multimodal work, while Granite 4 by IBM Research is the closest related family to check for audio. Granite Vision has 4 listed variants and reaches up to 128k context, while Granite 4 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
Which Granite Vision model should I use?
If price is the main constraint, use the pricing table first because Granite Vision does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Granite 4.0 3B Vision with 128k context and multimodal inputs.