LLM Reference

Nemotron 3 Super-120B-A12B

Released
2026-03-11
Last refreshed
2026-06-29
Status
Researched 31d ago
Open weightsCommercial use: permittedCodingRAGAgentsLong contextClassificationJSON / Tool use

Nemotron 3 Super-120B-A12B is worth evaluating for coding, rag, and agents when its provider route and context window match the workload.

Use it for

  • Teams evaluating coding, rag, and agents
  • Workloads that can use a 1.05m context window
  • Buyers comparing 4 tracked provider routes

Do not use it for

  • Vision or document-understanding workloads
Specifications
Released
2026-03-11
Context
1.05m
Parameters
120B
Architecture
Decoder Only
Specialization
general
Openness
Open weights
License
NVIDIA Open ModelCommercial use: permitted
Created by

Accelerated AI for enterprise solutions

Santa Clara, California, United States
Founded 2015
Website
Pricing
Output / 1M
$0.450
Input / 1M
$0.090

Cheapest of 6 routes · OpenRouter

About

NVIDIA Nemotron 3 Super-120B-A12B is a 120B total / 12B active hybrid Latent MoE model with interleaved Mamba-2 and MoE layers for agentic, reasoning, and conversational tasks. Fireworks lists the NVFP4 variant for on-demand deployment with 262k context.

NVIDIA Nemotron-3 Super-120B-A12B is a hybrid Mamba-Transformer mixture-of-experts language model developed by NVIDIA Research, targeting agentic reasoning, long-document processing, and multi-step task planning. The model has 120 billion total parameters with 12 billion activated parameters per token, achieved through a Latent Mixture-of-Experts (LatentMoE) architecture that interleaves Mamba-2 state-space model layers with sparse MoE layers and select self-attention layers. Mamba-2 processes sequences in linear time with respect to sequence length, enabling efficient handling of very long contexts.

The model supports a context window of up to 1 million tokens per NVIDIA's technical report, with the Mamba-2 layers providing linear-time scaling. NVIDIA pretrained the model on 25 trillion tokens using NVFP4 (NVIDIA's 4-bit floating-point format optimized for Blackwell GPUs). Multi-Token Prediction (MTP) layers accelerate text generation throughput. The LatentMoE design routes 4 experts at the compute cost of 1, improving intelligence-per-FLOP efficiency. Note that API providers may enforce shorter context limits (Fireworks AI lists 262K for the NVFP4 variant).

The base BF16 weights and NVFP4 variant are available on Hugging Face (nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-Base-BF16) under NVIDIA's open model license. API access is available through NVIDIA NIM, Fireworks AI, DeepInfra, and OpenRouter. The model is designed for NVIDIA Blackwell hardware but can run in BF16 on earlier GPU generations.

Nemotron 3 Super-120B-A12B has a 1.05m-token context window.

Nemotron 3 Super-120B-A12B input tokens at $0.09/1M, output at $0.45/1M.

Top use-case fit: coding, agents, and build tasks

Coding

Q/$ B

2 relevant benchmarks in the decision map.

RAG

Q/$ D

1 relevant benchmark in the decision map.

Agents

Q/$ A

2 relevant benchmarks in the decision map.

Capabilities

Structured Outputs

Benchmark peer barsfor Coding

Benchmark scores(7)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.
BenchmarkScoreVersionSource
Google-Proof Q&A80.0diamondhttps://artificialanalysis.ai/leaderboards/models
AIME 202590.2Widely reported from NVIDIA launch materials (accuracy)https://llm-stats.com/blog/research/nemotron-3-super-launch
LiveCodeBench78.4LiveCodeBench v6 (accuracy)https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8
MMLU PRO83.6From official HuggingFace model card (accuracy)https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8
RULER96.3RULER (accuracy)https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8
SWE-bench Verified60.5SWE-bench Verified (resolved)https://llm-stats.com/blog/research/nemotron-3-super-launch
τ-bench61.1TauBench V2 average (airline 56 (accuracy)https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8

Migration checks

No linked migration route is available for this model yet.

Frequently asked questions

What is the context window of Nemotron 3 Super-120B-A12B?

Nemotron 3 Super-120B-A12B has a context window of 1.05m tokens.

How much does Nemotron 3 Super-120B-A12B cost?

Nemotron 3 Super-120B-A12B pricing ranges from $0.09/1M to $0.15/1M input tokens depending on the provider.

When was Nemotron 3 Super-120B-A12B released?

Nemotron 3 Super-120B-A12B was released on 2026-03-11.

Which providers offer Nemotron 3 Super-120B-A12B?

Nemotron 3 Super-120B-A12B is available from 6 providers: Cloudflare Workers AI, DeepInfra, NVIDIA NIM, OpenRouter, Fireworks AI, Vercel AI Gateway.

What benchmarks has Nemotron 3 Super-120B-A12B been tested on?

Nemotron 3 Super-120B-A12B has been evaluated on 7 benchmarks, including Google-Proof Q&A, AIME 2025, LiveCodeBench, MMLU PRO, RULER.