Which Fireworks AI model is cheapest?

The cheapest Fireworks AI model in this catalog is gpt-oss-20b at $0.07/1M input tokens.

What is the context window for Fireworks AI models?

Fireworks AI models listed here range from 2k to 10m tokens of context.

How does Fireworks AI compare to NVIDIA NIM?

Fireworks AI lists 224 models here, while NVIDIA NIM lists 143. Compare pricing availability, context windows, and benchmark coverage before choosing a host.

Fireworks AI Models — Pricing & Benchmarks

224 models available

Fireworks AI hosts 224 AI models in this catalog. The lowest listed input price is gpt-oss-20b at $0.07/1M input tokens. LLM Reference lets you compare these models across all 80 providers without switching tabs.

Model	Input (per 1M)	Output (per 1M)	Context
gpt-oss-20b	$0.07	$0.3	131k
CodeGemma 2B	$0.1	$0.1	8k
Cogito v1 Preview Llama 3B	$0.1	$0.1	128k
DeepSeek Coder 1.3B	$0.1	$0.1	4k
DeepSeek R1 Distill Qwen-1.5B	$0.1	$0.1	128k
Fireworks Zephyr-7B-beta	$0.1	$0.1	8k
Gemma 2B Instruct	$0.1	$0.1	2k
Gemma 3 1B Instruct	$0.1	$0.1	32k
Llama 3.2 1B	$0.1	$0.1	128k
Llama 3.2 1B Instruct	$0.1	$0.1	128k
Llama 3.2 3B	$0.1	$0.1	128k
Llama 3.2 3B Instruct	$0.1	$0.1	128k
Llama Guard 3 1B	$0.1	$0.1	128k
Phi-2	$0.1	$0.1	2k
Phi-3 Mini 128K	$0.1	$0.1	128k
Qwen2.5-0.5B-Instruct	$0.1	$0.1	128k
Qwen2.5-1.5B-Instruct	$0.1	$0.1	128k
Qwen2.5-Coder-1.5B-Instruct	$0.1	$0.1	32k
Qwen2.5-Coder-3B-Instruct	$0.1	$0.1	32k
Qwen3-0.6B	$0.1	$0.1	40k
Qwen3-1.7B	$0.1	$0.1	40k
Stable Code 3B	$0.1	$0.1	16k
Stable LM 2 Zephyr 1.6B	$0.1	$0.1	—
Stable LM Zephyr 3B	$0.1	$0.1	—
StarCoder2 3B	$0.1	$0.1	8k
Fireworks Llama-3-8B-Instruct	$0.15	$0.15	8k
Fireworks Solar-10.7B-Instruct-v1.0	$0.15	$0.15	4k
gpt-oss-120b	$0.15	$0.6	131k
Qwen3-VL-30B-A3B	$0.15	$0.6	128k
Chronos Hermes 13B V2	$0.2	$0.2	4k
CodeGemma 7B	$0.2	$0.2	8k
CodeGemma 7B Instruct	$0.2	$0.2	8k
CodeLlama 13B	$0.2	$0.2	100k
CodeLlama 13B Instruct	$0.2	$0.2	16k
CodeLlama 13B Python	$0.2	$0.2	100k
CodeLlama 7B	$0.2	$0.2	100k
CodeLlama 7B Instruct	$0.2	$0.2	16k
CodeLlama 7B Python	$0.2	$0.2	100k
CodeQwen1.5 7B	$0.2	$0.2	64k
Cogito v1 Preview Llama 8B	$0.2	$0.2	128k
Cogito v1 Preview Qwen-14B	$0.2	$0.2	128k
DeepSeek Coder 6.7B	$0.2	$0.2	4k
DeepSeek Coder 7B V1.5	$0.2	$0.2	16k
DeepSeek Coder V2 Lite Instruct	$0.2	$0.2	128k
DeepSeek R1 0528 Distill Qwen3-8B	$0.2	$0.2	160k
DeepSeek R1 0528 Qwen3-8B	$0.2	$0.2	160k
DeepSeek R1 Distill Llama 8B	$0.2	$0.2	128k
DeepSeek R1 Distill Qwen-14B	$0.2	$0.2	128k
DeepSeek R1 Distill Qwen-7B	$0.2	$0.2	128k
DeepSeek V2 Lite Chat	$0.2	$0.2	32k
ELYZA Japanese Llama 2 7B	$0.2	$0.2	—
Gemma 2 9B	$0.2	$0.2	8k
Gemma 2 9B Instruct	$0.2	$0.2	8k
Gemma 3 12B Instruct	$0.2	$0.2	128k
Gemma 3 4B Instruct	$0.2	$0.2	128k
Gemma 7B	$0.2	$0.2	8k
Gemma 7B Instruct	$0.2	$0.2	8k
GLM-4 9B	$0.2	$0.2	131k
GLM-Z1 9B	$0.2	$0.2	—
Hermes 2 Pro Mistral 7B	$0.2	$0.2	32k
Japanese Stable VLM	$0.2	$0.2	—
Japanese StableLM Gamma 7B	$0.2	$0.2	8k
Llama 2 13B	$0.2	$0.2	4k
Llama 2 13B Chat	$0.2	$0.2	4k
Llama 2 7B	$0.2	$0.2	4k
Llama 2 7B Chat	$0.2	$0.2	4k
Llama 3 8B	$0.2	$0.2	8k
Llama 3 8B Instruct	$0.2	$0.2	8k
Llama 3.1 8B Instruct	$0.2	$0.2	128k
Llama 3.2 11B Vision Instruct	$0.2	$0.2	128k
Llama Guard 2 8B	$0.2	$0.2	8k
Llama Guard 3 8B	$0.2	$0.2	8k
Llama Guard 7B	$0.2	$0.2	2k
Mistral 7B Instruct v0.1	$0.2	$0.2	8k
Mistral 7B Instruct v0.2	$0.2	$0.2	32k
Mistral 7B Instruct v0.3	$0.2	$0.2	32k
Mistral 7B OpenOrca	$0.2	$0.2	8k
Mistral 7B v0.1	$0.2	$0.2	8k
Mistral NeMo (2407)	$0.2	$0.2	128k
MythoMax L2 13B	$0.2	$0.2	4k
Nous Capybara 7B V1.9	$0.2	$0.2	—
Nous Hermes Llama 2 13B	$0.2	$0.2	—
Nous Hermes Llama 2 7B	$0.2	$0.2	—
OpenChat 3.5 (0106)	$0.2	$0.2	8k
OpenHermes 2 Mistral 7B	$0.2	$0.2	32k
OpenHermes 2.5 Mistral 7B	$0.2	$0.2	32k
Phi-3 Vision	$0.2	$0.2	128k
Pythia 12B	$0.2	$0.2	2k
Qwen-14B	$0.2	$0.2	32k
Qwen2-7B	$0.2	$0.2	128k
Qwen2.5-14B	$0.2	$0.2	128k
Qwen2.5-14B-Instruct	$0.2	$0.2	128k
Qwen2.5-7B	$0.2	$0.2	128k
Qwen2.5-7B-Instruct	$0.2	$0.2	128k
Qwen2.5-Coder-14B-Instruct	$0.2	$0.2	128k
Qwen2.5-Coder-7B-Instruct	$0.2	$0.2	128k
Qwen3-14B	$0.2	$0.2	40k
Qwen3-4B	$0.2	$0.2	40k
Qwen3-8B	$0.2	$0.2	128k
Snorkel Mistral PairRM	$0.2	$0.2	32k
SOLAR 10.7B	$0.2	$0.2	4k
StarCoder	$0.2	$0.2	8k
StarCoder2 15B	$0.2	$0.2	8k
StarCoder2 7B	$0.2	$0.2	8k
Toppy M 7B	$0.2	$0.2	4k
Yi 6B	$0.2	$0.2	200k
Yi 9B	$0.2	$0.2	4k
Zephyr 7B Beta	$0.2	$0.2	—
Fireworks CodeLlama-34b-Instruct	$0.3	$0.3	16k
MiniMax M2.7	$0.3	$1.20	205k
MiniMax-M1-240k	$0.3	$1.20	240k
MiniMax-M2-240k	$0.3	$1.20	240k
MiniMax-M2-240k-0905	$0.3	$1.20	240k
MiniMax-M2-80k-0905	$0.3	$1.20	80k
MiniMax-M2.5	$0.3	$1.20	—
Fireworks Yi-34B-Chat	$0.4	$0.4	4k
DeepSeek Coder V2 Lite	$0.5	$0.5	128k
Dolphin 2.6 Mixtral 8x7B	$0.5	$0.5	32k
Firefunction V1	$0.5	$0.5	8k
Mixtral 8x7B	$0.5	$0.5	32k
Nous Hermes 2 Mixtral 8x7B	$0.5	$0.5	32k
Phi 3.5 MoE Instruct	$0.5	$0.5	128k
Phi 4 Reasoning	$0.5	$0.5	128k
Phi 4 Reasoning Plus	$0.5	$0.5	128k
Qwen3-30B-A3B	$0.5	$0.5	128k
DeepSeek Prover V2	$0.56	$1.68	160k
DeepSeek R1	$0.56	$1.68	128k
DeepSeek R1 0528	$0.56	$1.68	130k
DeepSeek R1 Basic	$0.56	$1.68	160k
DeepSeek V2.5	$0.56	$1.68	128k
DeepSeek V3	$0.56	$1.68	64k
DeepSeek V3 0324	$0.56	$1.68	160k
DeepSeek V3.1	$0.56	$1.68	64k
DeepSeek V3.2	$0.56	$1.68	160k
GLM 4.7	$0.6	$2.20	200k
GLM-4.7	$0.6	$2.20	128k
Kimi K2 Instruct	$0.6	$2.50	131k
Kimi K2 Instruct 0905	$0.6	$2.50	131k
Kimi K2 Thinking	$0.6	$2.50	256k
Kimi K2.5	$0.6	$3.00	256k
Fireworks Qwen-72B-Chat	$0.8	$0.8	33k
CodeLlama 34B	$0.9	$0.9	100k
CodeLlama 34B Instruct	$0.9	$0.9	16k
CodeLlama 34B Python	$0.9	$0.9	100k
CodeLlama 70B	$0.9	$0.9	16k
CodeLlama 70B Instruct	$0.9	$0.9	16k
CodeLlama 70B Python	$0.9	$0.9	16k
Cogito v1 Preview Llama 70B	$0.9	$0.9	128k
Cogito v1 Preview Qwen-32B	$0.9	$0.9	128k
DeepSeek Coder 33B	$0.9	$0.9	16k
DeepSeek Coder 33B Instruct	$0.9	$0.9	16k
DeepSeek R1 Distill Llama 70B	$0.9	$0.9	128k
DeepSeek R1 Distill Qwen-32B	$0.9	$0.9	128k
Dolphin 2.9.2 Qwen2-72B	$0.9	$0.9	128k
FARE-20B	$0.9	$0.9	128k
Firefunction V2	$0.9	$0.9	32k
FireLLaVA 13B	$0.9	$0.9	4k
Gemma 2 27B Instruct	$0.9	$0.9	8k
Gemma 3 27B Instruct	$0.9	$0.9	128k
GLM Z1 32B	$0.9	$0.9	128k
GLM Z1 Rumination 32B	$0.9	$0.9	128k
GLM-4.5	$0.9	$0.9	128k
GLM-4.5-Air	$0.9	$0.9	128k
GLM-4.6	$0.9	$0.9	198k
GLM-4.7 Flash	$0.9	$0.9	198k
Japanese StableLM 70B	$0.9	$0.9	4k
KAT Coder	$0.9	$0.9	256k
KAT Dev 32B	$0.9	$0.9	—
KAT Dev 72B Exp	$0.9	$0.9	—
Llama 2 70B	$0.9	$0.9	4k
Llama 2 70B Chat	$0.9	$0.9	4k
Llama 3 70B Instruct	$0.9	$0.9	8k
Llama 3.1 70B Instruct	$0.9	$0.9	128k
Llama 3.2 90B Vision Instruct	$0.9	$0.9	128k
Llama 3.3 70B	$0.9	$0.9	8k
LLaVA 1.6 Hermes Yi 34B	$0.9	$0.9	200k
MiniMax M2	$0.9	$0.9	197k
MiniMax-M1-80k	$0.9	$0.9	80k
MiniMax-M2-80k	$0.9	$0.9	80k
Mistral Large	$0.9	$0.9	32k
Mistral NeMo Instruct (2407)	$0.9	$0.9	128k
Mistral Small 3.1 24B Instruct	$0.9	$0.9	128k
Nous Capybara 34B	$0.9	$0.9	200k
Nous Hermes 2 Yi 34B	$0.9	$0.9	200k
Nous Hermes Llama 2 70B	$0.9	$0.9	—
Phi 3.5 Mini Instruct	$0.9	$0.9	128k
Phi 4 Multimodal Instruct	$0.9	$0.9	128k
Phi-4 14B	$0.9	$0.9	16k
Phi-4 Mini	$0.9	$0.9	128k
Phind CodeLlama 34B Python V1	$0.9	$0.9	8k
Phind CodeLlama 34B V1	$0.9	$0.9	8k
Phind CodeLlama 34B V2	$0.9	$0.9	8k
Qwen-72B	$0.9	$0.9	32k
Qwen1.5-72B	$0.9	$0.9	32k
Qwen2-72B	$0.9	$0.9	128k
Qwen2-VL-72B-Instruct	$0.9	$0.9	32k
Qwen2.5-32B	$0.9	$0.9	128k
Qwen2.5-32B-Instruct	$0.9	$0.9	128k
Qwen2.5-72B	$0.9	$0.9	128k
Qwen2.5-72B-Instruct	$0.9	$0.9	128k
Qwen2.5-Coder-32B	$0.9	$0.9	128k
Qwen2.5-Coder-32B-Instruct	$0.9	$0.9	128k
Qwen3-32B	$0.9	$0.9	40k
Yi 34B	$0.9	$0.9	200k
Yi 34B 200K	$0.9	$0.9	200k
Yi Large	$0.9	$0.9	32k
Kimi K2.6	$0.95	$4.00	262k
GLM-5	$1.00	$3.20	200k
DBRX Instruct	$1.20	$1.20	32k
DeepSeek Coder V2	$1.20	$1.20	128k
DeepSeek Coder V2 Instruct	$1.20	$1.20	128k
ERNIE 4.5	$1.20	$1.20	8k
Mixtral 8x22B Instruct v0.1	$1.20	$1.20	64k
Mixtral 8x22B v0.1	$1.20	$1.20	64k
Qwen3-235B-A22B	$1.20	$1.20	128k
GLM-5.1	$1.40	$4.40	200k
Fireworks DBRX-Instruct	$1.50	$1.50	33k
DeepSeek V4 Pro	$1.74	$3.48	1m
Llama 3.1 405B Instruct	$3.00	$3.00	128k
Llama 4 Maverick 17B Instruct FP8	—	—	1m
Llama 4 Scout 17B-16E Instruct	—	—	10m
Mistral Small	—	—	32k
Nemotron 3 Super-120B-A12B	—	—	1.05m
Qwen3-Coder-480B-A35B-Instruct	—	—	262k

Where else to run this

Llama Guard 7B on Fireworks AI

Provider setup and pricing

Llama Guard 2 8B on Fireworks AI

Provider setup and pricing

Llama Guard 3 8B on Fireworks AI

Provider setup and pricing

Llama Guard 7B on Together AI

Alternative host

Llama Guard 2 8B on OctoAI API (Deprecated)

Alternative host

Llama Guard 3 8B on Cloudflare Workers AI

Alternative host

Pricing Overview

Cheapest$0.07/1M

Most expensive$3.00/1M

About Fireworks AI

The Fireworks AI Platform is a comprehensive generative AI solution that enables developers and businesses to build, customize, and deploy AI models at scale. It supports a diverse range of cutting-edge open-source models, including Meta's Llama and Stable Diffusion, for tasks such as natural language processing and image generation. The platform's serverless architecture allows for quick deployment without extensive infrastructure management, operating on a pay-as-you-go basis. Users can fine-tune models using parameter-efficient techniques, ensuring tailored AI solutions that maintain high performance for specific business needs. Optimized for high throughput and low latency, the platform can handle trillions of inferences daily while providing a seamless user experience. It offers tools for efficient model maintenance and iteration, allowing businesses to focus on innovation rather than complex AI model management. The platform's design facilitates easy integration and customization, enabling organizations to effectively scale their AI-powered solutions. With its cost-efficient approach and comprehensive features, the Fireworks AI Platform empowers businesses to leverage advanced AI capabilities for enhanced productivity and competitive advantage in their respective markets.

Full provider profile →

Links

Dashboard Documentation Pricing