LLM Reference

Cosmos 3 Models by NVIDIA AI

NVIDIA AIOpenMDW 1.1Open SourceMultimodal
6 models2026Up to 256k ctx

About

NVIDIA Cosmos 3 is the world's first fully open omnimodel for physical AI, built on a Mixture-of-Transformers (MoT) architecture that combines a vision-language autoregressive Reasoner tower with a diffusion-based Generator tower. The family natively understands and generates text, images, video, ambient sound, and robot action sequences with physics-grounded accuracy. Designed for robotics, autonomous vehicles, and smart infrastructure; supports synthetic data generation, robot policy training, world simulation, and VLM reasoning. Announced at Computex 2026 on 2026-06-01; model weights released 2026-05-31. GitHub: https://github.com/nvidia/cosmos.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

6 in view

Use when the workload needs multimodal, 256k context, and 16B parameters.

2026-05multimodal256k context16B parameters

Use when the workload needs multimodal, 256k context, and 64B parameters.

2026-05multimodal256k context64B parameters

Use when the workload needs image generation, 4k context, and 64B parameters.

2026-05image generation4k context64B parameters

Use when the workload needs video generation, 4k context, and 64B parameters.

2026-05video generation4k context64B parameters

Use when the workload needs robotics, 4k context, and 16B parameters.

2026-05robotics4k context16B parameters

Use when the workload needs multimodal, multimodal inputs, and audio.

Unknown releasemultimodalmultimodal inputsaudio

Release Timeline

2 release groups
2026-05
5 current
Cosmos 3 Nano
multimodal256k context16B parameters
Current
Cosmos 3 Nano Policy DROID
robotics4k context16B parameters
Current
Cosmos 3 Super
multimodal256k context64B parameters
Current
Cosmos 3 Super Image2Video
video generation4k context64B parameters
Current
Cosmos 3 Super Text2Image
image generation4k context64B parameters
Current
Unknown release
1 current
Cosmos 3 Edge
multimodalmultimodal inputsaudio
Current

Specifications(6 models)

Cosmos 3 model specifications comparison
ModelReleasedContextParametersVisionMultimodalReasoningStructured Outputs
Cosmos 3 Nano2026-05256k16BYesYesYesNo
Cosmos 3 Super2026-05256k64BYesYesYesNo
Cosmos 3 Super Text2Image2026-054k64BNoYesNoNo
Cosmos 3 Super Image2Video2026-054k64BYesYesNoNo
Cosmos 3 Nano Policy DROID2026-054k16BYesYesNoYes
Cosmos 3 EdgeYesYesNoNo

Available From(1 provider)

Frequently Asked Questions

What is Cosmos 3 used for?
Cosmos 3 is used for multimodal, image generation, and image. The family description and listed model capabilities point to those workloads as the best fit.
How does Cosmos 3 compare to NVIDIA Nemotron Nano 12B v2 VL?
Cosmos 3 by NVIDIA AI is strongest where you need multimodal, while NVIDIA Nemotron Nano 12B v2 VL by NVIDIA AI is the closest related family to check for structured outputs. Cosmos 3 has 6 listed variants and reaches up to 256k context, so compare the specs and pricing tables before choosing a production model.
Which Cosmos 3 model should I use?
If price is the main constraint, use the pricing table first because Cosmos 3 does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Cosmos 3 Nano with 256k context and reasoning and multimodal inputs.

Models(6)