Using Cosmos 3 Super on NVIDIA NIM

Implementation guide · Cosmos 3 · NVIDIA AI

ProvisionedOpen Weights

Quick Start

1
Create an account at NVIDIA NIM and generate an API key.
2
Use the NVIDIA NIM SDK or REST API to call cosmos3-reasoner-super — see the documentation for request format.

API Portal Documentation Pricing Model Card

Code Examples

See NVIDIA NIM documentation for integration details.

About NVIDIA NIM

NIM packages inference runtimes and model profiles into containers that expose standard API surfaces such as chat completions, completions, model listing, tokenization, health, and management endpoints. The hosted API path is useful for prototyping and catalog discovery, while the NGC/container path is the self-hosted route for teams that want GPU-hour infrastructure control, private-network deployment, Kubernetes scaling, or NVIDIA AI Enterprise support. Per-token pricing is not a universal provider-level claim in the current seed data; pricing should stay attached to sourced model-provider rows or NVIDIA's current catalog terms.

NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices. Developers can try hosted NIM APIs through the NVIDIA API Catalog on build.nvidia.com, then move the same model families into self-hosted NIM containers on NVIDIA GPUs in a data center, private cloud, public cloud, or workstation. The catalog positions NIM around optimized open and NVIDIA models, including chat, coding, reasoning, retrieval, vision, speech, and safety use cases, with downloadable model cards and API endpoints where NVIDIA exposes them.

View all models on NVIDIA NIM →

Pricing on NVIDIA NIM

Type	Price (per 1M)
Image input	$1.00
Video input	$1.00
Audio input	$1.00

Capabilities

VisionMultimodalReasoningAudio

About Cosmos 3 Super

Cosmos 3 Super is NVIDIA's flagship 64B-parameter omnimodel for physical AI, designed for large-scale synthetic data generation and high-fidelity simulation on NVIDIA Hopper and Blackwell datacenter GPUs. Architecture: dual-tower Mixture-of-Transformers with a 32B autoregressive Reasoner and a 32B diffusion-based Generator. Supports 256K token reasoning context, 720p video generation at variable frame rates, and 10+ robot embodiment action domains. Ranked #1 among open models on Physics-IQ, PAI-Bench, R-Bench, RoboLab, RoboArena, VANTAGE-Bench, TAR, and Artificial Analysis image/video leaderboards (Computex 2026). Training data: 1.3B data points across 393 datasets (2024-2026). Inference performance (vLLM-Omni): ~55s for 50-step video on 8xH200. Available as open weights on Hugging Face and via Cosmos 3 Reasoner NIM (NIM_MODEL_SIZE=super). Robot action input/output is preserved in this description because the model schema does not have a dedicated action modality field.

Full model details →

Model Specs

Released2026-05-31

Parameters64B

Context256k

ArchitectureMixture of Transformers

More Models on NVIDIA NIM

Cosmos 3 Nano

All models on NVIDIA NIM →

Provider

NVIDIA NIM

NVIDIA

Santa Clara, California, United States