Granite Vision 3.2 2B
granite-vision-3.2-2b
Open SourceMultimodal
About
IBM Granite Vision 3.2 2B is a compact vision-language model for visual document understanding. Architecture: SigLIP vision encoder + two-layer MLP connector + Granite 3.1 2B Instruct LLM. Excels at tables, charts, OCR, infographics, and document QA. Benchmarks: DocVQA 0.89, ChartQA 0.87, TextVQA 0.78, OCRBench 0.77. Apache 2.0.
Granite Vision 3.2 2B has a 128K-token context window.
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution
Specifications
FamilyGranite Vision
Released2025-02-26
Parameters2B
Context128K
ArchitectureSigLIP vision encoder + 2-layer MLP (GELU) + Granite 3.1 2B Instruct LLM
Created by
Creating reliable and adaptable AI solutions
Armonk, New York, United States
Founded 1945
Website