MiniCPM-V 4.6
minicpm-v-4.6
Open SourceMultimodalmultimodalvision-language
About
OpenBMB's compact 1.3B vision-language model released May 11, 2026, designed for on-device deployment on smartphones (iOS, Android, HarmonyOS) and edge devices. Pairs a SigLIP2-400M vision encoder with a Qwen3.5-0.8B language backbone using the LLaVA-UHD v4 approach. Supports single-image, multi-image, and video input (up to 128 frames), with text output. Context window: 262,144 tokens. Achieves 13 on the Artificial Analysis Intelligence Index — highest for any open-weights model under 2B parameters, with 19x lower token cost than Qwen3.5-0.8B. Available via vLLM, SGLang, llama.cpp, and Ollama. Apache 2.0 license.
MiniCPM-V 4.6 has a 256K-token context window.
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch APIAudioFine-tuning