LLM Reference

NeVA 22B

About

NeVA-22B is a sophisticated vision-language model from NVIDIA, capable of interpreting and responding to intricate instructions that involve both text and images. It integrates a GPT-based language model with a CLIP model for image encoding, projecting image data into a shared text space for seamless processing. Trained with extensive datasets, including image-caption pairs and synthetic GPT-4 generated data, NeVA-22B excels in tasks such as language generation and visual question answering. It is optimized for NVIDIA’s hardware and utilizes Triton and TensorRT-LLM for efficient inference. Despite its advancements, users should be cautious of potential biases and inaccuracies in its outputs.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Providers(1)

ProviderInput (per 1M)Output (per 1M)Type
NVIDIA NIM
Provisioned

Specifications

FamilyNeVA
Parameters22B
ArchitectureDecoder Only
Specializationgeneral