Pricing
| Type | Price (per 1M) |
|---|---|
| Input tokens | Free |
| Output tokens | Free |
Capabilities
VisionMultimodalReasoningFunction CallingTool UseJSON ModeCode Execution
About NeVA 22B
NeVA-22B is a sophisticated vision-language model from NVIDIA, capable of interpreting and responding to intricate instructions that involve both text and images. It integrates a GPT-based language model with a CLIP model for image encoding, projecting image data into a shared text space for seamless processing. Trained with extensive datasets, including image-caption pairs and synthetic GPT-4 generated data, NeVA-22B excels in tasks such as language generation and visual question answering. It is optimized for NVIDIA’s hardware and utilizes Triton and TensorRT-LLM for efficient inference. Despite its advancements, users should be cautious of potential biases and inaccuracies in its outputs.
Get Started
Model Specs
Released2024-03-01
Parameters22B
ArchitectureDecoder Only