NeVA 22B
About
NeVA-22B is a sophisticated vision-language model from NVIDIA, capable of interpreting and responding to intricate instructions that involve both text and images. It integrates a GPT-based language model with a CLIP model for image encoding, projecting image data into a shared text space for seamless processing. Trained with extensive datasets, including image-caption pairs and synthetic GPT-4 generated data, NeVA-22B excels in tasks such as language generation and visual question answering. It is optimized for NVIDIA’s hardware and utilizes Triton and TensorRT-LLM for efficient inference. Despite its advancements, users should be cautious of potential biases and inaccuracies in its outputs.
Capabilities
MultimodalFunction CallingTool UseJSON Mode
Providers(1)
| Provider | Input (per 1M) | Output (per 1M) | Type | |
|---|---|---|---|---|
| NVIDIA NIM | — | — | Provisioned |