LLM Reference
NVIDIA NIM

NVIDIA NIM Models — Pricing & Benchmarks

143 models available · NVIDIA

NVIDIA NIM hosts 143 AI models in this catalog. The lowest listed input price is Nemotron 3 Super-120B-A12B at $0.1/1M input tokens. LLM Reference lets you compare these models across all 80 providers without switching tabs.

Hosting moonshotai/kimi-k2.6? Browse Kimi K2.6 on NVIDIA NIM for GPU-hour pricing context or open the step-by-step NIM guide.

ModelInput (per 1M)Output (per 1M)Context
Nemotron 3 Super-120B-A12B$0.1$0.51.05m
Arctic4k
Baichuan 2 13B Chat4k
Bielik 11B v2.6 Instruct4k
Breeze 7B32k
ChatGLM3 6B8k
CodeGemma 1.1 7B8k
CodeGemma 7B Instruct8k
CodeLlama 70B16k
Codestral 22B32k
Codestral Mamba 7B256k
Colosseum 355B Instruct16k
Cosmos 3 Nano256k
Cosmos 3 Super256k
DBRX Instruct32k
DeepSeek Coder 6.7B4k
DeepSeek R1128k
DeepSeek R1 Distill Llama 8B128k
DeepSeek R1 Distill Qwen-14B128k
DeepSeek R1 Distill Qwen-32B128k
DeepSeek R1 Distill Qwen-7B128k
DeepSeek V364k
DeepSeek V3.164k
DeepSeek V3.1 Terminus164k
DeepSeek V3.2160k
DePlot
Dracarys Llama 3.1 70B Instruct8k
Falcon 3 7B Instruct128k
Fuyu-8B
Gemma 2 27B Instruct8k
Gemma 2 2B Instruct8k
Gemma 2 9B Instruct8k
Gemma 2 9B SahabatAI Instruct8k
Gemma 2B Instruct2k
Gemma 3 1B Instruct32k
Gemma 3 27B131k
Gemma 3n 2B (free)8k
Gemma 3n 4B (free)8k
Gemma 7B Instruct8k
GLM-4.7128k
GLM-5200k
gpt-oss-120b131k
gpt-oss-20b131k
Granite 3.3 8B Instruct128k
Granite 34B Code8k
Granite 8B Code8k
Granite Guardian 3.0 8B8k
Italia 10B Instruct16k
Jamba 1.5 Mini256k
Kimi K2 Instruct131k
Kimi K2 Instruct 0905131k
Kimi K2 Thinking256k
Kimi K2.5256k
Kimi K2.6262k
Kosmos 22k
Llama 2 70B Chat4k
Llama 3 70B Instruct8k
Llama 3 8B Instruct8k
Llama 3 Swallow 70B Instruct4k
Llama 3 Taiwan 70B Instruct8k
Llama 3.1 405B Instruct128k
Llama 3.1 70B Instruct128k
Llama 3.1 8B Instruct128k
Llama 3.1 NemoGuard 8B Content Safety4k
Llama 3.1 NemoGuard 8B Topic Control4k
Llama 3.1 Nemotron 70B Reward4k
Llama 3.1 Nemotron Nano 4B v1.14k
Llama 3.1 Nemotron Nano 8B v14k
Llama 3.1 Nemotron Nano VL 8B v14k
Llama 3.1 Swallow 70B Instruct4k
Llama 3.1 Swallow 8B Instruct4k
Llama 3.2 11B Vision Instruct128k
Llama 3.2 1B Instruct128k
Llama 3.2 3B Instruct128k
Llama 3.2 90B Vision Instruct128k
Llama 3.2 NV EmbedQA 1B v1512
Llama 3.2 NV EmbedQA 1B v24k
Llama 3.2 NV RerankQA 1B v24k
Llama 3.3 70B Instruct (free)66k
Llama 3.3 Nemotron Super 49B v1128k
Llama 4 Maverick 17B Instruct FP81m
Llama 4 Scout 17B-16E Instruct10m
Llama Guard 4 12B164k
LLaVA 1.6 Hermes Yi 34B200k
LLaVA 1.6 Mistral 7B32k
Magistral Small 2506128k
Marin 8B Instruct128k
MiniMax M2.5197k
Mistral 7B Instruct v0.232k
Mistral 7B Instruct v0.332k
Mistral 7B v0.18k
Mistral Large32k
Mistral Large 3 675B Instruct128k
Mistral Medium 3 Instruct128k
Mistral NeMo Instruct (2407)128k
Mistral Nemotron
Mistral Small 3.1 24B Instruct128k
Mistral Small 4256k
Mixtral 8x22B v0.164k
Mixtral 8x7B32k
Nemotron 3 Nano256k
Nemotron 4 340B4k
Nemotron Mini 4B Instruct4k
Nemotron Mini Hindi 4B Instruct4k
Nemotron-Nano-12B-v2-VL
Nemotron-Nano-9B-v2
NeVA 22B
NV-EmbedCode 7B v14k
NVIDIA Llama 3 ChatQA 70B8k
NVIDIA Llama 3 ChatQA 8B8k
PaliGemma 3B 896512
Phi 3.5 Mini Instruct128k
Phi 4 Multimodal Instruct128k
Phi-3 Medium 128K128k
Phi-3 Medium 4K4k
Phi-3 Mini 128K128k
Phi-3 Mini 4k4k
Phi-3 Small 128K128k
Phi-3 Small 8K8k
Phi-3 Vision128k
Phi-4 Mini128k
Phi-4 Mini Flash Reasoning128k
Qwen2-7B128k
Qwen2-7B-Instruct128k
Qwen2.5-7B-Instruct128k
Qwen2.5-Coder-32B-Instruct128k
Qwen2.5-Coder-7B-Instruct128k
Qwen3-Coder-480B-A35B-Instruct262k
RakutenAI 7B Chat4k
RakutenAI 7B Instruct4k
RecurrentGemma 2B4k
Sarvam-M Multilingual Hybrid128k
SEA-LION 7B4k
SeaLLM 7B V2.532k
Seed-OSS 36B Instruct4k
ShieldGemma 9B8k
SOLAR 10.7B4k
StarCoder2 15B8k
StarCoder2 7B8k
Step 3.7 Flash256k
Stockmark 2 100B Instruct128k
Teuken 7B Instruct4k
Yi Large32k

Where else to run this

Pricing Overview

Cheapest$0.10/1M
Most expensive$0.10/1M

About NVIDIA NIM

NIM packages inference runtimes and model profiles into containers that expose standard API surfaces such as chat completions, completions, model listing, tokenization, health, and management endpoints. The hosted API path is useful for prototyping and catalog discovery, while the NGC/container path is the self-hosted route for teams that want GPU-hour infrastructure control, private-network deployment, Kubernetes scaling, or NVIDIA AI Enterprise support. Per-token pricing is not a universal provider-level claim in the current seed data; pricing should stay attached to sourced model-provider rows or NVIDIA's current catalog terms.

Full provider profile →