LLM Reference

About

The Cambrian family of LLMs is a set of multimodal large language models (MLLMs) focused on enhancing visual understanding. Unlike typical MLLMs that emphasize language abilities, Cambrian models prioritize sensory grounding through visual perception, facilitating their application in real-world scenarios. These models are characterized by five foundational elements: diverse visual representations via various vision encoders, a unique Spatial Vision Aggregator (SVA) that integrates visual and language features efficiently, high-quality visual instruction-tuning data, innovative instruction tuning methodologies, and a dedicated vision-centric benchmark known as CV-Bench. Available in parameter sizes of 8B, 13B, and 34B, Cambrian models are open-source, offering comprehensive resources such as model weights, code, datasets, and detailed training instructions. They achieve state-of-the-art results on numerous benchmarks aimed at visual understanding, illustrating their robust performance in the field 12.

Models(3)

Details

Models3