Nemotron 3 Super-120B-A12B
Nemotron 3 Super-120B-A12B is worth evaluating for coding, rag, and agents when its provider route and context window match the workload.
Use it for
- Teams evaluating coding, rag, and agents
- Workloads that can use a 1.05m context window
- Buyers comparing 4 tracked provider routes
Do not use it for
- Vision or document-understanding workloads
- Family
- Nemotron 3
- Released
- 2026-03-11
- Context
- 1.05m
- Parameters
- 120B
- Architecture
- Decoder Only
- Specialization
- general
- Openness
- Open weights
- License
- NVIDIA Open ModelCommercial use: permitted
Cheapest of 6 routes · OpenRouter
About
NVIDIA Nemotron 3 Super-120B-A12B is a 120B total / 12B active hybrid Latent MoE model with interleaved Mamba-2 and MoE layers for agentic, reasoning, and conversational tasks. Fireworks lists the NVFP4 variant for on-demand deployment with 262k context.
NVIDIA Nemotron-3 Super-120B-A12B is a hybrid Mamba-Transformer mixture-of-experts language model developed by NVIDIA Research, targeting agentic reasoning, long-document processing, and multi-step task planning. The model has 120 billion total parameters with 12 billion activated parameters per token, achieved through a Latent Mixture-of-Experts (LatentMoE) architecture that interleaves Mamba-2 state-space model layers with sparse MoE layers and select self-attention layers. Mamba-2 processes sequences in linear time with respect to sequence length, enabling efficient handling of very long contexts.
The model supports a context window of up to 1 million tokens per NVIDIA's technical report, with the Mamba-2 layers providing linear-time scaling. NVIDIA pretrained the model on 25 trillion tokens using NVFP4 (NVIDIA's 4-bit floating-point format optimized for Blackwell GPUs). Multi-Token Prediction (MTP) layers accelerate text generation throughput. The LatentMoE design routes 4 experts at the compute cost of 1, improving intelligence-per-FLOP efficiency. Note that API providers may enforce shorter context limits (Fireworks AI lists 262K for the NVFP4 variant).
The base BF16 weights and NVFP4 variant are available on Hugging Face (nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-Base-BF16) under NVIDIA's open model license. API access is available through NVIDIA NIM, Fireworks AI, DeepInfra, and OpenRouter. The model is designed for NVIDIA Blackwell hardware but can run in BF16 on earlier GPU generations.
Nemotron 3 Super-120B-A12B has a 1.05m-token context window.
Nemotron 3 Super-120B-A12B input tokens at $0.09/1M, output at $0.45/1M.
Top use-case fit: coding, agents, and build tasks
Coding
Q/$ B2 relevant benchmarks in the decision map.
RAG
Q/$ D1 relevant benchmark in the decision map.
Agents
Q/$ A2 relevant benchmarks in the decision map.
Provider price ladder
Compare all 6Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.
| Provider | Input / 1M | Output / 1M | Cache | Route |
|---|---|---|---|---|
| OpenRouter | $0.090 | $0.450 | - | Serverless |
| DeepInfra | $0.100 | $0.500 | - | Serverless |
| NVIDIA NIM | $0.100 | $0.500 | read $0.100 | Serverless |
| Vercel AI Gateway | $0.150 | $0.650 | - | Serverless |
Available via routers & gateways(2)
OpenRouter
HybridUnified hybrid gateway to 400+ models from 60+ providers via a single OpenAI-compatible API, with optional auto-routing that selects the best model per prompt.
NVIDIA LLM Router Blueprint
RouterNVIDIA's open-source AI blueprint for LLM routing that selects the optimal model per prompt via intent classification or neural auto-routing; being deprecated 2026-06-20.
Capabilities
Benchmark peer barsfor Coding
Benchmark scores(7)
| Benchmark | Score | Version | Source |
|---|---|---|---|
| Google-Proof Q&A | 80.0 | diamond | https://artificialanalysis.ai/leaderboards/models |
| AIME 2025 | 90.2 | Widely reported from NVIDIA launch materials (accuracy) | https://llm-stats.com/blog/research/nemotron-3-super-launch |
| LiveCodeBench | 78.4 | LiveCodeBench v6 (accuracy) | https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8 |
| MMLU PRO | 83.6 | From official HuggingFace model card (accuracy) | https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8 |
| RULER | 96.3 | RULER (accuracy) | https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8 |
| SWE-bench Verified | 60.5 | SWE-bench Verified (resolved) | https://llm-stats.com/blog/research/nemotron-3-super-launch |
| τ-bench | 61.1 | TauBench V2 average (airline 56 (accuracy) | https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8 |
Migration checks
No linked migration route is available for this model yet.
Frequently asked questions
What is the context window of Nemotron 3 Super-120B-A12B?
Nemotron 3 Super-120B-A12B has a context window of 1.05m tokens.
How much does Nemotron 3 Super-120B-A12B cost?
Nemotron 3 Super-120B-A12B pricing ranges from $0.09/1M to $0.15/1M input tokens depending on the provider.
When was Nemotron 3 Super-120B-A12B released?
Nemotron 3 Super-120B-A12B was released on 2026-03-11.
Which providers offer Nemotron 3 Super-120B-A12B?
Nemotron 3 Super-120B-A12B is available from 6 providers: Cloudflare Workers AI, DeepInfra, NVIDIA NIM, OpenRouter, Fireworks AI, Vercel AI Gateway.
What benchmarks has Nemotron 3 Super-120B-A12B been tested on?
Nemotron 3 Super-120B-A12B has been evaluated on 7 benchmarks, including Google-Proof Q&A, AIME 2025, LiveCodeBench, MMLU PRO, RULER.
Cheapest of 6 routes · OpenRouter