NV-Embed Models by NVIDIA AI
4 models2024–2025Up to 4k ctx
About
NV-Embed is NVIDIA's family of specialized embedding and reranking models, including NV-EmbedCode for code retrieval and NV-EmbedQA/RerankQA for question-answering tasks. NVIDIA NIM also hosts BAAI BGE models (such as BGE-M3) as first-class retrieval endpoints in its API catalog.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
4 in view
NV-EmbedCode 7B v1Current
Use when the workload needs embedding, 4k context, and 7B parameters.
2025-06embedding4k context7B parameters
Llama 3.2 NV EmbedQA 1B v2Current
Use when the workload needs embedding, 4k context, and 1B parameters.
2025-03embedding4k context1B parameters
Llama 3.2 NV RerankQA 1B v2Current
Use when the workload needs ranking, 4k context, and 1B parameters.
2025-03ranking4k context1B parameters
Llama 3.2 NV EmbedQA 1B v1Current
Use when the workload needs embedding, 512 context, and 1B parameters.
2024-10embedding512 context1B parameters
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| NV-EmbedCode 7B v1 | Use when the workload needs embedding, 4k context, and 7B parameters. | 2025-06 | embedding4k context7B parameters | Current |
| Llama 3.2 NV EmbedQA 1B v2 | Use when the workload needs embedding, 4k context, and 1B parameters. | 2025-03 | embedding4k context1B parameters | Current |
| Llama 3.2 NV RerankQA 1B v2 | Use when the workload needs ranking, 4k context, and 1B parameters. | 2025-03 | ranking4k context1B parameters | Current |
| Llama 3.2 NV EmbedQA 1B v1 | Use when the workload needs embedding, 512 context, and 1B parameters. | 2024-10 | embedding512 context1B parameters | Current |
Release Timeline
3 release groups2025-06
1 current
NV-EmbedCode 7B v1
Currentembedding4k context7B parameters
2025-03
2 current
Llama 3.2 NV EmbedQA 1B v2
Currentembedding4k context1B parameters
Llama 3.2 NV RerankQA 1B v2
Currentranking4k context1B parameters
2024-10
1 current
Llama 3.2 NV EmbedQA 1B v1
Currentembedding512 context1B parameters
Specifications(4 models)
| Model | Released | Context | Parameters |
|---|---|---|---|
| NV-EmbedCode 7B v1 | 2025-06 | 4k | 7B |
| Llama 3.2 NV EmbedQA 1B v2 | 2025-03 | 4k | 1B |
| Llama 3.2 NV RerankQA 1B v2 | 2025-03 | 4k | 1B |
| Llama 3.2 NV EmbedQA 1B v1 | 2024-10 | 512 | 1B |
Available From(1 provider)
Frequently Asked Questions
- What is NV-Embed used for?
- NV-Embed is used for embedding, ranking, and coding. The family description and listed model capabilities point to those workloads as the best fit.
- How does NV-Embed compare to NVIDIA Nemotron Nano 12B v2 VL?
- NV-Embed by NVIDIA AI is strongest where you need embedding, while NVIDIA Nemotron Nano 12B v2 VL by NVIDIA AI is the closest related family to check for structured outputs. NV-Embed has 4 listed variants and reaches up to 4k context, so compare the specs and pricing tables before choosing a production model.
- Which NV-Embed model should I use?
- If price is the main constraint, use the pricing table first because NV-Embed does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate NV-EmbedCode 7B v1 with 4k context.






