Which NVIDIA NIM model is cheapest?

The cheapest NVIDIA NIM model in this catalog is Nemotron 3 Super-120B-A12B at $0.1/1M input tokens.

What is the context window for NVIDIA NIM models?

NVIDIA NIM models listed here range from 512 to 10m tokens of context.

How does NVIDIA NIM compare to Fireworks AI?

NVIDIA NIM lists 143 models here, while Fireworks AI lists 224. Compare pricing availability, context windows, and benchmark coverage before choosing a host.

NVIDIA NIM Models — Pricing & Benchmarks

143 models available · NVIDIA

NVIDIA NIM hosts 143 AI models in this catalog. The lowest listed input price is Nemotron 3 Super-120B-A12B at $0.1/1M input tokens. LLM Reference lets you compare these models across all 80 providers without switching tabs.

Hosting moonshotai/kimi-k2.6? Browse Kimi K2.6 on NVIDIA NIM for GPU-hour pricing context or open the step-by-step NIM guide.

Model	Input (per 1M)	Output (per 1M)	Context
Nemotron 3 Super-120B-A12B	$0.1	$0.5	1.05m
Arctic	—	—	4k
Baichuan 2 13B Chat	—	—	4k
Bielik 11B v2.6 Instruct	—	—	4k
Breeze 7B	—	—	32k
ChatGLM3 6B	—	—	8k
CodeGemma 1.1 7B	—	—	8k
CodeGemma 7B Instruct	—	—	8k
CodeLlama 70B	—	—	16k
Codestral 22B	—	—	32k
Codestral Mamba 7B	—	—	256k
Colosseum 355B Instruct	—	—	16k
Cosmos 3 Nano	—	—	256k
Cosmos 3 Super	—	—	256k
DBRX Instruct	—	—	32k
DeepSeek Coder 6.7B	—	—	4k
DeepSeek R1	—	—	128k
DeepSeek R1 Distill Llama 8B	—	—	128k
DeepSeek R1 Distill Qwen-14B	—	—	128k
DeepSeek R1 Distill Qwen-32B	—	—	128k
DeepSeek R1 Distill Qwen-7B	—	—	128k
DeepSeek V3	—	—	64k
DeepSeek V3.1	—	—	64k
DeepSeek V3.1 Terminus	—	—	164k
DeepSeek V3.2	—	—	160k
DePlot	—	—	—
Dracarys Llama 3.1 70B Instruct	—	—	8k
Falcon 3 7B Instruct	—	—	128k
Fuyu-8B	—	—	—
Gemma 2 27B Instruct	—	—	8k
Gemma 2 2B Instruct	—	—	8k
Gemma 2 9B Instruct	—	—	8k
Gemma 2 9B SahabatAI Instruct	—	—	8k
Gemma 2B Instruct	—	—	2k
Gemma 3 1B Instruct	—	—	32k
Gemma 3 27B	—	—	131k
Gemma 3n 2B (free)	—	—	8k
Gemma 3n 4B (free)	—	—	8k
Gemma 7B Instruct	—	—	8k
GLM-4.7	—	—	128k
GLM-5	—	—	200k
gpt-oss-120b	—	—	131k
gpt-oss-20b	—	—	131k
Granite 3.3 8B Instruct	—	—	128k
Granite 34B Code	—	—	8k
Granite 8B Code	—	—	8k
Granite Guardian 3.0 8B	—	—	8k
Italia 10B Instruct	—	—	16k
Jamba 1.5 Mini	—	—	256k
Kimi K2 Instruct	—	—	131k
Kimi K2 Instruct 0905	—	—	131k
Kimi K2 Thinking	—	—	256k
Kimi K2.5	—	—	256k
Kimi K2.6	—	—	262k
Kosmos 2	—	—	2k
Llama 2 70B Chat	—	—	4k
Llama 3 70B Instruct	—	—	8k
Llama 3 8B Instruct	—	—	8k
Llama 3 Swallow 70B Instruct	—	—	4k
Llama 3 Taiwan 70B Instruct	—	—	8k
Llama 3.1 405B Instruct	—	—	128k
Llama 3.1 70B Instruct	—	—	128k
Llama 3.1 8B Instruct	—	—	128k
Llama 3.1 NemoGuard 8B Content Safety	—	—	4k
Llama 3.1 NemoGuard 8B Topic Control	—	—	4k
Llama 3.1 Nemotron 70B Reward	—	—	4k
Llama 3.1 Nemotron Nano 4B v1.1	—	—	4k
Llama 3.1 Nemotron Nano 8B v1	—	—	4k
Llama 3.1 Nemotron Nano VL 8B v1	—	—	4k
Llama 3.1 Swallow 70B Instruct	—	—	4k
Llama 3.1 Swallow 8B Instruct	—	—	4k
Llama 3.2 11B Vision Instruct	—	—	128k
Llama 3.2 1B Instruct	—	—	128k
Llama 3.2 3B Instruct	—	—	128k
Llama 3.2 90B Vision Instruct	—	—	128k
Llama 3.2 NV EmbedQA 1B v1	—	—	512
Llama 3.2 NV EmbedQA 1B v2	—	—	4k
Llama 3.2 NV RerankQA 1B v2	—	—	4k
Llama 3.3 70B Instruct (free)	—	—	66k
Llama 3.3 Nemotron Super 49B v1	—	—	128k
Llama 4 Maverick 17B Instruct FP8	—	—	1m
Llama 4 Scout 17B-16E Instruct	—	—	10m
Llama Guard 4 12B	—	—	164k
LLaVA 1.6 Hermes Yi 34B	—	—	200k
LLaVA 1.6 Mistral 7B	—	—	32k
Magistral Small 2506	—	—	128k
Marin 8B Instruct	—	—	128k
MiniMax M2.5	—	—	197k
Mistral 7B Instruct v0.2	—	—	32k
Mistral 7B Instruct v0.3	—	—	32k
Mistral 7B v0.1	—	—	8k
Mistral Large	—	—	32k
Mistral Large 3 675B Instruct	—	—	128k
Mistral Medium 3 Instruct	—	—	128k
Mistral NeMo Instruct (2407)	—	—	128k
Mistral Nemotron	—	—	—
Mistral Small 3.1 24B Instruct	—	—	128k
Mistral Small 4	—	—	256k
Mixtral 8x22B v0.1	—	—	64k
Mixtral 8x7B	—	—	32k
Nemotron 3 Nano	—	—	256k
Nemotron 4 340B	—	—	4k
Nemotron Mini 4B Instruct	—	—	4k
Nemotron Mini Hindi 4B Instruct	—	—	4k
Nemotron-Nano-12B-v2-VL	—	—	—
Nemotron-Nano-9B-v2	—	—	—
NeVA 22B	—	—	—
NV-EmbedCode 7B v1	—	—	4k
NVIDIA Llama 3 ChatQA 70B	—	—	8k
NVIDIA Llama 3 ChatQA 8B	—	—	8k
PaliGemma 3B 896	—	—	512
Phi 3.5 Mini Instruct	—	—	128k
Phi 4 Multimodal Instruct	—	—	128k
Phi-3 Medium 128K	—	—	128k
Phi-3 Medium 4K	—	—	4k
Phi-3 Mini 128K	—	—	128k
Phi-3 Mini 4k	—	—	4k
Phi-3 Small 128K	—	—	128k
Phi-3 Small 8K	—	—	8k
Phi-3 Vision	—	—	128k
Phi-4 Mini	—	—	128k
Phi-4 Mini Flash Reasoning	—	—	128k
Qwen2-7B	—	—	128k
Qwen2-7B-Instruct	—	—	128k
Qwen2.5-7B-Instruct	—	—	128k
Qwen2.5-Coder-32B-Instruct	—	—	128k
Qwen2.5-Coder-7B-Instruct	—	—	128k
Qwen3-Coder-480B-A35B-Instruct	—	—	262k
RakutenAI 7B Chat	—	—	4k
RakutenAI 7B Instruct	—	—	4k
RecurrentGemma 2B	—	—	4k
Sarvam-M Multilingual Hybrid	—	—	128k
SEA-LION 7B	—	—	4k
SeaLLM 7B V2.5	—	—	32k
Seed-OSS 36B Instruct	—	—	4k
ShieldGemma 9B	—	—	8k
SOLAR 10.7B	—	—	4k
StarCoder2 15B	—	—	8k
StarCoder2 7B	—	—	8k
Step 3.7 Flash	—	—	256k
Stockmark 2 100B Instruct	—	—	128k
Teuken 7B Instruct	—	—	4k
Yi Large	—	—	32k

Where else to run this

Qwen2-7B-Instruct on NVIDIA NIM

Provider setup and pricing

ShieldGemma 9B on NVIDIA NIM

Provider setup and pricing

Dracarys Llama 3.1 70B Instruct on NVIDIA NIM

Provider setup and pricing

Fireworks AI model catalog

224 tracked models

Together AI model catalog

106 tracked models

Pricing Overview

Cheapest$0.10/1M

Most expensive$0.10/1M

About NVIDIA NIM

NIM packages inference runtimes and model profiles into containers that expose standard API surfaces such as chat completions, completions, model listing, tokenization, health, and management endpoints. The hosted API path is useful for prototyping and catalog discovery, while the NGC/container path is the self-hosted route for teams that want GPU-hour infrastructure control, private-network deployment, Kubernetes scaling, or NVIDIA AI Enterprise support. Per-token pricing is not a universal provider-level claim in the current seed data; pricing should stay attached to sourced model-provider rows or NVIDIA's current catalog terms.

Full provider profile →

Links

Dashboard Documentation Pricing