LLM ReferenceLLM Reference

Nemotron 3 Super

nemotron-3-super

About

NVIDIA's 120.6B parameter hybrid Mamba-Transformer MoE model with 12.7B active parameters per forward pass. LatentMoE with 512 experts (top-22 activation) and Multi-Token Prediction for speculative decoding. Trained on 25 trillion tokens. Optimized for Blackwell GPUs (B200) with 4x faster inference vs Hopper. Supports up to 1M token context window. Strong performance in agentic tasks and long-context applications (RULER@1M). Up to 2.2x-7.5x higher throughput than comparable open-source models.

Nemotron 3 Super has a 1M-token context window.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch APIAudioFine-tuning

Rankings

Specifications

Released2026-03-01
Parameters120.6B (12.7B active)
Context1M
ArchitectureMixture of Experts
Specializationgeneral
LicenseApache 2.0
Trainingpretrained

Created by

Accelerated AI for enterprise solutions

Santa Clara, California, United States
Founded 2015
Website