What is LLaVA used for?

LLaVA is used for vision and multimodal work, coding, and chatbot and role-playing use cases. The family description and listed model capabilities point to those workloads as the best fit.

How does LLaVA compare to LLaVA 1.5?

LLaVA by Haotian Liu is strongest where you need vision and multimodal work, while LLaVA 1.5 by Haotian Liu is the closest related family to check for structured outputs. LLaVA has 4 listed variants and reaches up to 4k context, while LLaVA 1.5 reaches up to 4k context, so compare the specs and pricing tables before choosing a production model.

Which LLaVA model should I use?

If price is the main constraint, use the pricing table first because LLaVA does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate LLaVA 13B with 4k context and multimodal inputs.

LLaVA Models by Haotian Liu

Haotian Liu

4 models2023Up to 4k ctx

About

LLaVA, or Large Language and Vision Assistant, is an advanced family of open-source large multimodal models (LMMs) developed by a collaborative team from the University of Wisconsin-Madison, Microsoft Research, and Columbia University 126. These models uniquely integrate a vision encoder, such as CLIP ViT-L/14, with large language models like Vicuna, Mistral, and Nous-Hermes to enable robust visual and language understanding 126. A key innovation of LLaVA models is their end-to-end training process, enriched with GPT-4 generated multimodal instruction-following data to optimize performance 12. The evolution of LLaVA models includes LLaVA-1.5, which added an MLP vision-language connector and academic task-oriented data, and LLaVA-NeXT (1.6), which improved image resolution and broadened LLM support 6. Prioritizing data efficiency, these models are highly accessible for research purposes 12.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

4 in view

LLaVA Vicuna 13BCurrent

Use when the workload needs 13B parameters.

2023-0413B parameters

LLaVA Llama 2 13BCurrent

Use when the workload needs 13B parameters.

2023-0413B parameters

LLaVA Llama 2 7BCurrent

Use when the workload needs 7B parameters.

2023-047B parameters

LLaVA 13BCurrent

Use when the workload needs 4k context, 13B parameters, and multimodal inputs.

2023-044k context13B parametersmultimodal inputs

Current LLaVA variants with use-when guidance and lifecycle status
Model	Use when	Released	Signals	Status
LLaVA Vicuna 13B	Use when the workload needs 13B parameters.	2023-04	13B parameters	Current
LLaVA Llama 2 13B	Use when the workload needs 13B parameters.	2023-04	13B parameters	Current
LLaVA Llama 2 7B	Use when the workload needs 7B parameters.	2023-04	7B parameters	Current
LLaVA 13B	Use when the workload needs 4k context, 13B parameters, and multimodal inputs.	2023-04	4k context13B parametersmultimodal inputs	Current

Release Timeline

1 release group

2023-04

4 current

LLaVA 13B

4k context13B parametersmultimodal inputs

Current

LLaVA Llama 2 13B

13B parameters

Current

LLaVA Llama 2 7B

7B parameters

Current

LLaVA Vicuna 13B

13B parameters

Current

Specifications(4 models)

LLaVA model specifications comparison
Model	Released	Context	Parameters	Vision	Multimodal
LLaVA Vicuna 13B	2023-04	—	13B	No	No
LLaVA Llama 2 13B	2023-04	—	13B	No	No
LLaVA Llama 2 7B	2023-04	—	7B	No	No
LLaVA 13B	2023-04	4k	13B	Yes	Yes

Available From(1 provider)

Replicate API

Frequently Asked Questions

What is LLaVA used for?: LLaVA is used for vision and multimodal work, coding, and chatbot and role-playing use cases. The family description and listed model capabilities point to those workloads as the best fit.
How does LLaVA compare to LLaVA 1.5?: LLaVA by Haotian Liu is strongest where you need vision and multimodal work, while LLaVA 1.5 by Haotian Liu is the closest related family to check for structured outputs. LLaVA has 4 listed variants and reaches up to 4k context, while LLaVA 1.5 reaches up to 4k context, so compare the specs and pricing tables before choosing a production model.
Which LLaVA model should I use?: If price is the main constraint, use the pricing table first because LLaVA does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate LLaVA 13B with 4k context and multimodal inputs.