LLM Reference

Cerebras LLaVA 7B

About

Cerebras LLaVA 7B is an advanced multimodal AI model designed to seamlessly integrate text and image processing capabilities. Utilizing a transformer architecture, it combines a large language model, fine-tuned from Vicuna-7B checkpoints, with a CLIP-VisionModel-Large vision encoder. This setup allows the model to perform tasks like visual question answering, image captioning, multimodal dialogues, and optical character recognition, demonstrating state-of-the-art performance across various benchmarks. The LLaVA 7B is open-source, enhancing collaboration and innovation within the AI community, while its diverse training data supports a broad understanding of multiple contexts and tasks.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

ArchitectureDecoder Only
Specializationgeneral