LLaVA 13B
Open SourceMultimodal
About
Original LLaVA (Large Language-and-Vision Assistant) 13B model. Multimodal vision+language model combining a vision encoder with a language model for visual understanding tasks.
Capabilities
VisionMultimodalReasoningFunction CallingTool UseJSON ModeCode Execution
Providers(1)
| Provider | Input (per 1M) | Output (per 1M) | Type | |
|---|---|---|---|---|
| Replicate API | — | — | Serverless |
Specifications
FamilyLLaVA
Released2023-04-17
Parameters13B
Context4K
ArchitectureDecoder Only
Specializationgeneral
Trainingfinetuning