LLM Reference
NVIDIA NIM

Using Cosmos 3 Nano on NVIDIA NIM

Implementation guide · Cosmos 3 · NVIDIA AI

ProvisionedOpen Source

Quick Start

  1. 1
    Create an account at NVIDIA NIM and generate an API key.
  2. 2
    Use the NVIDIA NIM SDK or REST API to call cosmos3-reasoner-nano — see the documentation for request format.

Code Examples

See NVIDIA NIM documentation for integration details.

About NVIDIA NIM

NIM packages inference runtimes and model profiles into containers that expose standard API surfaces such as chat completions, completions, model listing, tokenization, health, and management endpoints. The hosted API path is useful for prototyping and catalog discovery, while the NGC/container path is the self-hosted route for teams that want GPU-hour infrastructure control, private-network deployment, Kubernetes scaling, or NVIDIA AI Enterprise support. Per-token pricing is not a universal provider-level claim in the current seed data; pricing should stay attached to sourced model-provider rows or NVIDIA's current catalog terms.

NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices. Developers can try hosted NIM APIs through the NVIDIA API Catalog on build.nvidia.com, then move the same model families into self-hosted NIM containers on NVIDIA GPUs in a data center, private cloud, public cloud, or workstation. The catalog positions NIM around optimized open and NVIDIA models, including chat, coding, reasoning, retrieval, vision, speech, and safety use cases, with downloadable model cards and API endpoints where NVIDIA exposes them.

Pricing on NVIDIA NIM

TypePrice (per 1M)
Image input$1.00
Video input$1.00
Audio input$1.00

Capabilities

VisionMultimodalReasoningAudio

About Cosmos 3 Nano

Cosmos 3 Nano is NVIDIA's 16B-parameter omnimodel optimized for efficient inference on workstation-grade hardware (NVIDIA RTX PRO 6000). Architecture: dual-tower Mixture-of-Transformers with an 8B autoregressive Reasoner and an 8B diffusion-based Generator. The Reasoner supports up to 256K tokens of context for vision-language reasoning; the Generator produces video up to 720p at variable frame rates (default 189 frames). Natively handles text, image, video, audio (48kHz stereo), and robot action trajectories across 10+ robot embodiments including Franka Panda, UR, Google robot, and UMI. BF16 precision only. Available as open weights on Hugging Face and via the Cosmos 3 Reasoner NIM (NIM_MODEL_SIZE=nano). Intended for real-time robotics inference and edge-adjacent deployment. Robot action input/output is preserved in this description because the model schema does not have a dedicated action modality field.

Model Specs

Released2026-05-31
Parameters16B
Context256k
ArchitectureMixture-of-Transformers

Provider

NVIDIA NIM
NVIDIA NIM

NVIDIA

Santa Clara, California, United States