LLaVA
About
LLaVA, or Large Language and Vision Assistant, is an advanced family of open-source large multimodal models (LMMs) developed by a collaborative team from the University of Wisconsin-Madison, Microsoft Research, and Columbia University 126. These models uniquely integrate a vision encoder, such as CLIP ViT-L/14, with large language models like Vicuna, Mistral, and Nous-Hermes to enable robust visual and language understanding 126. A key innovation of LLaVA models is their end-to-end training process, enriched with GPT-4 generated multimodal instruction-following data to optimize performance 12. The evolution of LLaVA models includes LLaVA-1.5, which added an MLP vision-language connector and academic task-oriented data, and LLaVA-NeXT (1.6), which improved image resolution and broadened LLM support 6. Prioritizing data efficiency, these models are highly accessible for research purposes 12.
Specifications(4 models)
| Model | Released | Context | Parameters | Vision | Multimodal |
|---|---|---|---|---|---|
| LLaVA Vicuna 13B | 2023-04 | — | 13B | No | No |
| LLaVA Llama 2 13B | 2023-04 | — | 13B | No | No |
| LLaVA Llama 2 7B | 2023-04 | — | 7B | No | No |
| LLaVA 13B | 2023-04 | 4K | 13B | Yes | Yes |
Available From(1 provider)
Frequently Asked Questions
- What is LLaVA?
- LLaVA, or Large Language and Vision Assistant, is an advanced family of open-source large multimodal models (LMMs) developed by a collaborative team from the University of Wisconsin-Madison, Microsoft Research, and Columbia University 126. These models uniquely integrate a vision encoder, such as CLIP ViT-L/14, with large language models like Vicuna, Mistral, and Nous-Hermes to enable robust visual and language understanding 126. A key innovation of LLaVA models is their end-to-end training process, enriched with GPT-4 generated multimodal instruction-following data to optimize performance 12. The evolution of LLaVA models includes LLaVA-1.5, which added an MLP vision-language connector and academic task-oriented data, and LLaVA-NeXT (1.6), which improved image resolution and broadened LLM support 6. Prioritizing data efficiency, these models are highly accessible for research purposes 12.
- How many models are in the LLaVA family?
- The LLaVA family contains 4 models.
- What is the latest LLaVA model?
- The latest model is LLaVA Vicuna 13B, released in 2023-04.


