LLM ReferenceLLM Reference

LLaVA 1.6 Mistral 7B

Deprecated

About

LLaVA-v1.6 Mistral-7B is an open-source, multimodal language model capable of processing text and images. Built on the Mistral-7B-Instruct-v0.2 base, it combines a large language model with a vision encoder to enhance reasoning, optical character recognition, and world understanding. Trained on substantial datasets, including image-text pairs from LAION/CC/SBU, GPT-generated data, and VQA data, it was evaluated against 12 benchmarks. The model improves upon LLaVA-1.5 with higher image resolution processing and better reasoning, offering bilingual support and commercial licensing. It finds use in applications like chatbots, image captioning, and visual QA tasks but requires significant computational resources for high-res images.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution

Providers(2)

Compare all →
ProviderInput (per 1M)Output (per 1M)Type
NVIDIA NIMProvisioned
Replicate API$0.05$0.25Serverless

Specifications

FamilyLLaVA 1.6
Released2024-01-31
Parameters7B
Context32K
ArchitectureDecoder Only
Knowledge cutoff2023-12
Specializationgeneral
Trainingfinetuning

Created by

Academic researcher focused on vision models

N/A
Founded N/A
Website