LLM ReferenceLLM Reference

Granite Vision 4.1 4B

granite-vision-4.1-4b

Open SourceMultimodal

About

IBM Granite Vision 4.1 4B (3.4B LLM + 0.6B vision encoder and projectors) is a vision-language model for enterprise document extraction. Built on Granite 4.1 3B with LoRA (rank 256), SigLIP2 vision encoder, and Window Q-Former projectors with 8 vision-to-LLM injection points. Specializes in chart extraction (Chart2CSV, Chart2Summary, Chart2Code), table extraction (JSON/HTML/OTSL), and semantic key-value pair (KVP) extraction. Achieves 94.4% exact-match on VAREX KVP benchmark. Integrates with Docling. Apache 2.0.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution

Rankings

Specifications

Released2026-04-29
Parameters4B
ArchitectureSigLIP2 vision encoder + Window Q-Former projectors + Granite 4.1 3B LLM with LoRA rank 256

Created by

Creating reliable and adaptable AI solutions

Armonk, New York, United States
Founded 1945
Website