LLM Reference

Models on DeepInfra

58 models available

ModelInput (per 1M)Output (per 1M)Context
Qwen3 9B$0.04$0.2256K
CodeGemma 1.1 7B$0.05$0.15
DeepInfra Google Gemma 2B$0.05$0.158192
DeepInfra Google Gemma 7B$0.05$0.158192
DeepInfra Llama 3 8B Instruct$0.05$0.158192
DeepInfra Mistral 7B Instruct$0.05$0.1532768
DeepInfra Phi 3 Mini 4K Instruct$0.05$0.154096
Gemma 1.1 7B Instruct$0.05$0.158K
Llama 3 8B Instruct$0.05$0.158K
LLaVA 1.5 7B$0.05$0.15
Mistral 7B v0.1$0.05$0.158K
OpenChat 3.6 8B$0.05$0.158K
Qwen2 7B$0.05$0.15128K
WizardLM-2 7B$0.05$0.15
Llama 2 7B Chat$0.07$0.074K
DeepInfra Stable LM 2 12B$0.1$0.34096
NVIDIA Nemotron 3 Super 120B$0.1$0.5262K
Llama 2 13B Chat$0.13$0.134K
Phi-3 Medium 4K$0.14$0.414K
DeepInfra Mixtral 8x7B Instruct$0.15$0.4532768
Dolphin 2.6 Mixtral 8x7B$0.15$0.45
CodeLlama 34B$0.20$0.45100K
DeepInfra StarCoder2 15B$0.2$0.616384
Phind CodeLlama 34B V2$0.20$0.458K
StarCoder2 15B$0.20$0.608K
Yi 34B$0.25$0.38200K
Qwen3 27B$0.26$2.6262K
airoboros L2 70B 2.2.1$0.45$0.65
CodeLlama 70B$0.45$0.6516K
DeepInfra CodeLlama 70B Instruct$0.45$0.65100k
DeepInfra Llama 3 70B Instruct$0.45$0.658192
DeepInfra Phi 3 Small 128K Instruct$0.45$0.65128k
DeepInfra Qwen 1.5 72B Chat$0.45$0.6532768
Llama 3 70B Instruct$0.45$0.658K
Qwen2 72B$0.45$0.65128K
DBRX Instruct$0.60$1.2032K
DeepInfra DBRX Instruct$0.6$1.232768
Llama 2 70B Chat$0.64$0.644K
Mixtral 8x22B v0.1$0.65$0.6564K
WizardLM-2 8x22B$0.65$0.65
Zephyr ORPO 141B$0.65$0.65
Mistral NeMo Instruct (2407)$2$4128K
Qwen2.5 7B Instruct$3$3128K
Nemotron 4 340B$4.20$4.204K
Qwen2.5 14B Instruct$10$10128K
Qwen2 57B-A14B$16$16
Qwen2.5 Coder 32B$20$20
Qwen2.5 72B Instruct$23$23128K
DeepSeek V3$32$8964k
Llama 3.1 70B Instruct$40$40128K
Mixtral 8x7B$54$5432K
DeepSeek R1 Distill Llama 70B$70$80128K
Command R128K
Command R+128K
DeepSeek R1128K
Llama 4 Maverick 17B Instruct FP81M
Llama 4 Scout 17B-16E Instruct328K
Mistral Small32K

Pricing Overview

Cheapest$0.04/1M
Most expensive$70.00/1M

About DeepInfra

DeepInfra offers serverless AI inference with a simple API, supporting hundreds of models across text generation, embeddings, and more. Pay-per-token pricing with no upfront commitments.

Full provider profile →