LLM Reference

Nemotron-Cascade Models by NVIDIA AI

NVIDIA AILlama 3 CommunityOpen weights
1 model2026Up to 1.05m ctx

Details

ResearcherNVIDIA AI
Commercial useCommercial use: conditional
Models1
Released2026
Max context1.05m

Links

Website

About

Cascade MoE reasoning models with superior performance on math and code tasks

Current Variants

Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.

1 in view

Use when the workload needs 1.05m context and 30B parameters.

2026-031.05m context30B parameters

Release Timeline

1 release group
2026-03
1 current
Nemotron-Cascade-2-30B-A3B
1.05m context30B parameters
Current

Specifications(1 models)

Nemotron-Cascade model specifications comparison
ModelReleasedContextParameters
Nemotron-Cascade-2-30B-A3B2026-031.05m30B

Frequently Asked Questions

What is Nemotron-Cascade used for?
Nemotron-Cascade is used for coding and math-heavy prompts. The family description and listed model capabilities point to those workloads as the best fit.
How does Nemotron-Cascade compare to NVIDIA Nemotron Nano 12B v2 VL?
Nemotron-Cascade by NVIDIA AI is strongest where you need coding, while NVIDIA Nemotron Nano 12B v2 VL by NVIDIA AI is the closest related family to check for structured outputs. Nemotron-Cascade has 1 listed variant and reaches up to 1.05m context, so compare the specs and pricing tables before choosing a production model.
Which Nemotron-Cascade model should I use?
If price is the main constraint, use the pricing table first because Nemotron-Cascade does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Nemotron-Cascade-2-30B-A3B with 1.05m context.