Which DeepInfra model is cheapest?

The cheapest DeepInfra model in this catalog is Llama 3 8B Instruct at $0.02/1M input tokens.

What is the context window for DeepInfra models?

DeepInfra models listed here range from 4k to 10m tokens of context.

How does DeepInfra compare to Fireworks AI?

DeepInfra lists 60 models here, while Fireworks AI lists 224. Compare pricing availability, context windows, and benchmark coverage before choosing a host.

DeepInfra Models — Pricing & Benchmarks

60 models available

DeepInfra hosts 60 AI models in this catalog. The lowest listed input price is Llama 3 8B Instruct at $0.02/1M input tokens. LLM Reference lets you compare these models across all 80 providers without switching tabs.

Model	Input (per 1M)	Output (per 1M)	Context
Llama 3 8B Instruct	$0.02	$0.05	8k
Llama 3.1 8B Instruct	$0.02	$0.05	128k
Mistral NeMo Instruct (2407)	$0.02	$0.04	128k
Qwen2.5-7B-Instruct	$0.03	$0.03	128k
Qwen3-9B	$0.04	$0.2	256k
CodeGemma 1.1 7B	$0.05	$0.15	8k
DeepInfra Google Gemma 2B	$0.05	$0.15	8k
DeepInfra Google Gemma 7B	$0.05	$0.15	8k
DeepInfra Llama 3 8B Instruct	$0.05	$0.15	8k
DeepInfra Phi 3 Mini 4K Instruct	$0.05	$0.15	4k
Gemma 1.1 7B Instruct	$0.05	$0.15	8k
LLaVA 1.5 7B	$0.05	$0.15	4k
Mistral 7B Instruct v0.2	$0.05	$0.15	32k
Mistral 7B v0.1	$0.05	$0.15	8k
OpenChat 3.6 8B	$0.05	$0.15	8k
Qwen2-7B	$0.05	$0.15	128k
WizardLM-2 7B	$0.05	$0.15	—
Llama 2 7B Chat	$0.07	$0.07	4k
Llama 4 Scout 17B-16E Instruct	$0.08	$0.3	10m
DeepInfra Stable LM 2 12B	$0.1	$0.3	4k
Nemotron 3 Super-120B-A12B	$0.1	$0.5	1.05m
Qwen2.5-14B-Instruct	$0.1	$0.1	128k
Llama 2 13B Chat	$0.13	$0.13	4k
Phi-3 Medium 4K	$0.14	$0.41	4k
Dolphin 2.6 Mixtral 8x7B	$0.15	$0.45	32k
Llama 4 Maverick 17B Instruct FP8	$0.15	$0.6	1m
Mixtral 8x7B Instruct v0.1	$0.15	$0.45	33k
Qwen2-57B-A14B	$0.16	$0.16	—
CodeLlama 34B	$0.2	$0.45	100k
DeepInfra StarCoder2 15B	$0.2	$0.6	16k
Phind CodeLlama 34B V2	$0.2	$0.45	8k
Qwen2.5-Coder-32B	$0.2	$0.2	128k
StarCoder2 15B	$0.2	$0.6	8k
Yi 34B	$0.25	$0.38	200k
Qwen3.5-27B	$0.26	$2.60	262k
DeepSeek V3	$0.32	$0.89	64k
Qwen2.5-72B-Instruct	$0.36	$0.4	128k
Llama 3.1 70B Instruct	$0.4	$0.4	128k
airoboros L2 70B 2.2.1	$0.45	$0.65	—
CodeLlama 70B	$0.45	$0.65	16k
DeepInfra CodeLlama 70B Instruct	$0.45	$0.65	100k
DeepInfra Llama 3 70B Instruct	$0.45	$0.65	8k
DeepInfra Phi 3 Small 128K Instruct	$0.45	$0.65	128k
DeepInfra Qwen1.5-72B-Chat	$0.45	$0.65	33k
Llama 3 70B Instruct	$0.45	$0.65	8k
Qwen2-72B	$0.45	$0.65	128k
DeepSeek R1 0528	$0.5	$2.15	130k
Mixtral 8x7B	$0.54	$0.54	32k
DBRX Instruct	$0.6	$1.20	32k
DeepInfra DBRX Instruct	$0.6	$1.20	33k
Llama 2 70B Chat	$0.64	$0.64	4k
Mixtral 8x22B v0.1	$0.65	$0.65	64k
WizardLM-2 8x22B	$0.65	$0.65	—
Zephyr ORPO 141B	$0.65	$0.65	—
DeepSeek R1 Distill Llama 70B	$0.7	$0.8	128k
Nemotron 4 340B	$4.20	$4.20	4k
Command R	—	—	128k
Command R+	—	—	128k
DeepSeek R1	—	—	128k
Mistral Small	—	—	32k

Where else to run this

Llama 2 7B Chat on DeepInfra

Provider setup and pricing

Llama 2 13B Chat on DeepInfra

Provider setup and pricing

Llama 2 70B Chat on DeepInfra

Provider setup and pricing

Llama 2 7B Chat on Alibaba Cloud PAI-EAS

Alternative host

Llama 2 13B Chat on Alibaba Cloud PAI-EAS

Alternative host

Llama 2 70B Chat on Databricks Foundation Model Serving

Alternative host

Pricing Overview

Cheapest$0.02/1M

Most expensive$4.20/1M

About DeepInfra

DeepInfra offers serverless AI inference with a simple API, supporting hundreds of models across text generation, embeddings, and more. Pay-per-token pricing with no upfront commitments.

Full provider profile →

Links

Dashboard Documentation Pricing