Granite Vision 4.1 4B
granite-vision-4.1-4b
Open SourceMultimodal
About
IBM Granite Vision 4.1 4B (3.4B LLM + 0.6B vision encoder and projectors) is a vision-language model for enterprise document extraction. Built on Granite 4.1 3B with LoRA (rank 256), SigLIP2 vision encoder, and Window Q-Former projectors with 8 vision-to-LLM injection points. Specializes in chart extraction (Chart2CSV, Chart2Summary, Chart2Code), table extraction (JSON/HTML/OTSL), and semantic key-value pair (KVP) extraction. Achieves 94.4% exact-match on VAREX KVP benchmark. Integrates with Docling. Apache 2.0.
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution
Specifications
FamilyGranite Vision
Released2026-04-29
Parameters4B
ArchitectureSigLIP2 vision encoder + Window Q-Former projectors + Granite 4.1 3B LLM with LoRA rank 256
Created by
Creating reliable and adaptable AI solutions
Armonk, New York, United States
Founded 1945
Website