NVLM Models by NVIDIA AI
About
The NVLM 1.0 family consists of advanced multimodal large language models from NVIDIA, designed to excel in vision-language tasks. These models not only rival top-tier proprietary models like GPT-4o but also compare favorably with open-access models such as Llama 3-V 405B. Uniquely, NVLM 1.0 enhances text-only performance post multimodal training, contrary to many multimodal models that may degrade in text capabilities. Comprising three primary architectures—NVLM-D (decoder-only), NVLM-X (cross-attention-based), and NVLM-H (hybrid)—each setup aims to maximize different multimodal processing facets. NVIDIA supports open research by releasing the model weights and plans to share the training code. NVLM 1.0 excels in tasks like OCR, multimodal reasoning, and coding, showcasing extensive capabilities beyond traditional text-related tasks 1212.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 128k context and 72B parameters.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| NVLM-D 72B | Use when the workload needs 128k context and 72B parameters. | 2024-09 | 128k context72B parameters | Current |
| NVLM-D 34B | Use when the workload needs 34B parameters. | 2024-09 | 34B parameters | Current |
| NVLM-X 72B | Use when the workload needs 72B parameters. | 2024-09 | 72B parameters | Current |
| NVLM-X 34B | Use when the workload needs 34B parameters. | 2024-09 | 34B parameters | Current |
| NVLM-H 72B | Use when the workload needs 72B parameters. | 2024-09 | 72B parameters | Current |
| NVLM-H 34B | Use when the workload needs 34B parameters. | 2024-09 | 34B parameters | Current |
Release Timeline
1 release groupSpecifications(6 models)
| Model | Released | Context | Parameters |
|---|---|---|---|
| NVLM-D 72B | 2024-09 | 128k | 72B |
| NVLM-D 34B | 2024-09 | — | 34B |
| NVLM-X 72B | 2024-09 | — | 72B |
| NVLM-X 34B | 2024-09 | — | 34B |
| NVLM-H 72B | 2024-09 | — | 72B |
| NVLM-H 34B | 2024-09 | — | 34B |
Frequently Asked Questions
- What is NVLM used for?
- NVLM is used for coding. The family description and listed model capabilities point to those workloads as the best fit.
- How does NVLM compare to NVIDIA Nemotron Nano 12B v2 VL?
- NVLM by NVIDIA AI is strongest where you need coding, while NVIDIA Nemotron Nano 12B v2 VL by NVIDIA AI is the closest related family to check for structured outputs. NVLM has 6 listed variants and reaches up to 128k context, so compare the specs and pricing tables before choosing a production model.
- Which NVLM model should I use?
- If price is the main constraint, use the pricing table first because NVLM does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate NVLM-D 72B with 128k context.






