LLM Reference

Cerebras LLaVA 13B

About

Cerebras LLaVA 13B is a sophisticated multimodal large language model designed by Cerebras Systems, integrating a vision encoder with a language model. The model features a CLIP-VisionModel-Large and a language model derived from Vicuna-13B checkpoints, further refined with instruction tuning on diverse datasets. It is equipped with a projector module for seamless combination of modalities. Geared towards research in multimodal systems, the model supports tasks such as visual question answering by processing images and text. Researchers should exercise caution due to the potential presence of offensive content in the training data. It is accessible through the LLaVA source code for implementation.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

ArchitectureDecoder Only
Specializationgeneral