LLM ReferenceLLM Reference
NVIDIA NIM

NVIDIA NIM Models — Pricing & Benchmarks

141 models available · NVIDIA

NVIDIA NIM hosts 141 AI models in this catalog. Per-token pricing is not listed for these NVIDIA NIM rows yet; compare context windows, benchmarks, and hosting options instead. LLM Reference lets you compare these models across all 63 providers without switching tabs.

Hosting moonshotai/kimi-k2.6? Browse Kimi K2.6 on NVIDIA NIM for GPU-hour pricing context or open the step-by-step NIM guide.

ModelInput (per 1M)Output (per 1M)Context
Arctic4K
Baichuan 2 13B Chat
Bielik 11B v2.6 Instruct4K
Breeze 7B
ChatGLM3 6B8K
CodeGemma 1.1 7B
CodeGemma 7B Instruct
CodeLlama 70B16K
Codestral 22B32K
Codestral Mamba 7B256K
Colosseum 355B Instruct16K
DBRX Instruct32K
DeepSeek Coder 6.7B4K
DeepSeek R1128K
DeepSeek R1 Distill Llama 8B128K
DeepSeek R1 Distill Qwen-14B128K
DeepSeek R1 Distill Qwen-32B128K
DeepSeek R1 Distill Qwen-7B128K
DeepSeek V364k
DeepSeek V3.164K
DeepSeek V3.1 Terminus164K
DeepSeek V3.2160K
DePlot
Dracarys Llama 3.1 70B Instruct8K
Falcon 3 7B Instruct128K
Fuyu-8B
Gemma 2 27B Instruct8K
Gemma 2 2B Instruct
Gemma 2 9B Instruct8K
Gemma 2 9B SahabatAI Instruct8K
Gemma 2B Instruct2K
Gemma 3 1B Instruct32K
Gemma 3 27B (free)131K
Gemma 3n 2B (free)8K
Gemma 3n 4B (free)8K
Gemma 7B Instruct8K
GLM-4.7
GLM-5200k
gpt-oss-120b131K
gpt-oss-20b131K
Granite 3.3 8B Instruct128K
Granite 34B Code8K
Granite 8B Code8K
Granite Guardian 3.0 8B8K
Italia 10B Instruct16K
Jamba 1.5 Mini256K
Kimi K2 Instruct
Kimi K2 Instruct 0905256K
Kimi K2 Thinking256K
Kimi K2.5256K
Kimi K2.6262K
Kosmos 2
Llama 2 70B Chat4K
Llama 3 70B Instruct8K
Llama 3 8B Instruct8K
Llama 3 Swallow 70B Instruct4K
Llama 3 Taiwan 70B Instruct8K
Llama 3.1 405B Instruct128K
Llama 3.1 70B Instruct128K
Llama 3.1 8B Instruct128K
Llama 3.1 NemoGuard 8B Content Safety4K
Llama 3.1 NemoGuard 8B Topic Control4K
Llama 3.1 Nemotron 70B Reward4K
Llama 3.1 Nemotron Nano 4B v1.14K
Llama 3.1 Nemotron Nano 8B v14K
Llama 3.1 Nemotron Nano VL 8B v14K
Llama 3.1 Swallow 70B Instruct4K
Llama 3.1 Swallow 8B Instruct4K
Llama 3.2 11B Vision Instruct128K
Llama 3.2 1B Instruct128K
Llama 3.2 3B Instruct128K
Llama 3.2 90B Vision Instruct128K
Llama 3.2 NV EmbedQA 1B v1512
Llama 3.2 NV EmbedQA 1B v24K
Llama 3.2 NV RerankQA 1B v24K
Llama 3.3 70B Instruct (free)66K
Llama 3.3 Nemotron Super 49B v1128K
Llama 4 Maverick 17B Instruct FP81M
Llama 4 Scout 17B-16E Instruct328K
Llama Guard 4 12B164K
LLaVA 1.6 Hermes Yi 34B200K
LLaVA 1.6 Mistral 7B32K
Magistral Small 2506128K
Marin 7B Instruct8K
Marin 8B Instruct128K
MiniMax M2.5197K
Mistral 7B Instruct v0.232K
Mistral 7B Instruct v0.332K
Mistral 7B v0.18K
Mistral Large32k
Mistral Large 3 675B Instruct128K
Mistral Medium 3 Instruct128K
Mistral NeMo Instruct (2407)128K
Mistral Nemotron
Mistral Small 3.1 24B Instruct128K
Mistral Small 4256K
Mixtral 8x22B v0.164K
Mixtral 8x7B32K
Nemotron 3 Nano256K
Nemotron 3 Super-120B-A12B1M
Nemotron 4 340B4K
Nemotron Mini 4B Instruct4K
Nemotron Mini Hindi 4B Instruct4K
Nemotron-Nano-12B-v2-VL
Nemotron-Nano-9B-v2
NeVA 22B
NV-EmbedCode 7B v14K
NVIDIA Llama 3 ChatQA 70B
NVIDIA Llama 3 ChatQA 8B
PaliGemma 3B 896512
Phi 3.5 Mini Instruct128K
Phi 4 Multimodal Instruct128K
Phi-3 Medium 128K128K
Phi-3 Medium 4K4K
Phi-3 Mini 128K128K
Phi-3 Mini 4k4K
Phi-3 Small 128K128K
Phi-3 Small 8K8K
Phi-3 Vision128K
Phi-4 Mini
Phi-4 Mini Flash Reasoning128K
Qwen2-7B128K
Qwen2-7B-Instruct128K
Qwen2.5-7B-Instruct128K
Qwen2.5-Coder-32B-Instruct
Qwen2.5-Coder-7B-Instruct
Qwen3-Coder-480B-A35B-Instruct256K
RakutenAI 7B Chat4K
RakutenAI 7B Instruct4K
RecurrentGemma 2B
Sarvam-M Multilingual Hybrid128K
SEA-LION 7B
SeaLLM 7B V2.5
Seed-OSS 36B Instruct4K
ShieldGemma 9B8K
SOLAR 10.7B
StarCoder2 15B8K
StarCoder2 7B8K
Stockmark 2 100B Instruct128K
Teuken 7B Instruct4K
Yi Large32K

About NVIDIA NIM

NIM packages inference runtimes and model profiles into containers that expose standard API surfaces such as chat completions, completions, model listing, tokenization, health, and management endpoints. The hosted API path is useful for prototyping and catalog discovery, while the NGC/container path is the self-hosted route for teams that want GPU-hour infrastructure control, private-network deployment, Kubernetes scaling, or NVIDIA AI Enterprise support. Per-token pricing is not a universal provider-level claim in the current seed data; pricing should stay attached to sourced model-provider rows or NVIDIA's current catalog terms.

Full provider profile →