Cerebras LLaVA 13B
About
Cerebras LLaVA 13B is a sophisticated multimodal large language model designed by Cerebras Systems, integrating a vision encoder with a language model. The model features a CLIP-VisionModel-Large and a language model derived from Vicuna-13B checkpoints, further refined with instruction tuning on diverse datasets. It is equipped with a projector module for seamless combination of modalities. Geared towards research in multimodal systems, the model supports tasks such as visual question answering by processing images and text. Researchers should exercise caution due to the potential presence of offensive content in the training data. It is accessible through the LLaVA source code for implementation.
Capabilities
MultimodalFunction CallingTool UseJSON Mode