LLM Reference
NeVA

NeVA

NVIDIA AI
CC-BY-NC-4.0

About

NeVA is a family of multimodal vision-language models developed by NVIDIA within the NeMo Multimodal ecosystem. These models integrate large language models, such as NVGPT or LLaMA, with a vision encoder to interpret and generate human-like responses to both text and images. The NeVA models undergo a two-stage training process involving feature alignment pre-training and end-to-end fine-tuning. They are ideal for tasks that require an advanced understanding of visual content and precise instruction following, showcasing capabilities akin to advanced multimodal models like GPT-4, even when presented with novel inputs. A specific variant, Video NeVA, expands its functionality to include video data processing by converting videos into sequences of image frames. Available in versions with 8B, 22B, and 43B parameters, NeVA models make use of NeMo's framework features for efficient training, including model parallelism and activation checkpointing. It is noteworthy that some NeVA models are restricted to non-commercial use 12.

Models(3)

Details

ResearcherNVIDIA AI
LicenseCC-BY-NC-4.0
Models3

Links

Website