LLM Reference

Nemotron 3 Super

About

NVIDIA's 120.6B parameter hybrid Mamba-Transformer MoE model with 12.7B active parameters per forward pass. LatentMoE with 512 experts (top-22 activation) and Multi-Token Prediction for speculative decoding. Trained on 25 trillion tokens. Optimized for Blackwell GPUs (B200) with 4x faster inference vs Hopper. Supports up to 1M token context window. Strong performance in agentic tasks and long-context applications (RULER@1M). Up to 2.2x-7.5x higher throughput than comparable open-source models.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

Released2026-03-01
Parameters120.6B (12.7B active)
Context1M
ArchitectureMixture of Experts
Specializationgeneral