Which Replicate API model is cheapest?

The cheapest Replicate API model in this catalog is Granite 3.3 8B Instruct at $0.03/1M input tokens.

What is the context window for Replicate API models?

Replicate API models listed here range from 512 to 1.05m tokens of context.

How does Replicate API compare to Hugging Face Inference Endpoints?

Replicate API lists 117 models here, while Hugging Face Inference Endpoints lists 9. Compare pricing availability, context windows, and benchmark coverage before choosing a host.

Replicate API Models — Pricing & Benchmarks

117 models available · Replicate

Replicate API hosts 117 AI models in this catalog. The lowest listed input price is Granite 3.3 8B Instruct at $0.03/1M input tokens. LLM Reference lets you compare these models across all 80 providers without switching tabs.

Model	Input (per 1M)	Output (per 1M)	Context
Granite 3.3 8B Instruct	$0.03	$0.25	128k
CodeLlama 7B	$0.05	$0.25	100k
CodeLlama 7B Python	$0.05	$0.25	100k
DeepSeek Math 7B	$0.05	$0.25	—
DeepSeek VL 7B	$0.05	$0.25	—
Gemma 2B Instruct	$0.05	$0.25	2k
Gemma 7B Instruct	$0.05	$0.25	8k
GLM-4V 9B	$0.05	$0.25	131k
GPT-5 Nano	$0.05	$0.4	400k
Hermes 2 Theta Llama 3 8B	$0.05	$0.25	8k
Llama 2 7B Chat	$0.05	$0.25	4k
Llama 3 8B	$0.05	$0.25	8k
Llama 3 8B Instruct	$0.05	$0.25	8k
Llama Guard 2 8B	$0.05	$0.25	8k
LLaVA 1.6 Mistral 7B	$0.05	$0.25	32k
LLaVA 1.6 Vicuna 7B	$0.05	$0.25	4k
Mistral 7B v0.1	$0.05	$0.25	8k
Phi-2	$0.05	$0.25	2k
Phi-3 Mini 128K	$0.05	$0.25	128k
Phi-3 Mini 4k	$0.05	$0.25	4k
Qwen-7B	$0.05	$0.25	8k
Qwen-VL	$0.05	$0.25	32k
Qwen1.5-0.5B	$0.05	$0.25	32k
Qwen1.5-1.8B	$0.05	$0.25	32k
Qwen1.5-4B	$0.05	$0.25	32k
Qwen1.5-7B	$0.05	$0.25	32k
Stable LM 3B	$0.05	$0.25	—
Stable LM 7B	$0.05	$0.25	—
Yi 6B	$0.05	$0.25	200k
Zephyr 7B Alpha	$0.05	$0.25	—
Zephyr 7B Beta	$0.05	$0.25	—
gpt-oss-20b	$0.09	$0.36	131k
CodeLlama 13B	$0.1	$0.5	100k
CodeLlama 13B Python	$0.1	$0.5	100k
Dolly 2.0 12B	$0.1	$0.5	—
Gemma 2 9B Instruct	$0.1	$0.1	8k
GPT-4.1 Nano	$0.1	$0.4	1.05m
Llama 2 13B Chat	$0.1	$0.5	4k
LLaVA 1.6 Vicuna 13B	$0.1	$0.5	4k
Nous Hermes 2 SOLAR 10.7B	$0.1	$0.5	4k
Nous Hermes Llama 2 13B	$0.1	$0.5	—
Qwen-14B	$0.1	$0.5	32k
Qwen1.5-14B	$0.1	$0.5	32k
Vicuna 13B	$0.1	$0.5	2k
GPT-4o Mini (07-18)	$0.15	$0.6	128k
gpt-oss-120b	$0.18	$0.72	131k
CodeLlama 34B	$0.2	$1.00	100k
CodeLlama 34B Python	$0.2	$1.00	100k
Llama Guard 4 12B	$0.2	$0.2	164k
Mixtral 8x7B	$0.2	$1.00	32k
Nous Hermes 2 Yi 34B	$0.2	$1.00	200k
Qwen1.5-32B	$0.2	$1.00	32k
WizardCoder 33B	$0.2	$1.00	16k
WizardCoder Python 34B	$0.2	$1.00	100k
Yi 34B	$0.2	$1.00	200k
Yi 34B 200K	$0.2	$1.00	200k
Yi VL 34B	$0.2	$1.00	131k
Claude 3 Haiku	$0.25	$1.25	200k
GPT-5 Mini	$0.25	$2.00	400k
Llama 3.1 8B Instruct	$0.25	$0.25	128k
Mistral 7B Instruct v0.2	$0.25	$0.25	32k
Gemini 2.5 Flash	$0.3	$2.50	1m
Llama Guard 3 8B	$0.3	$0.3	8k
Gemma 2 27B Instruct	$0.4	$0.4	8k
GPT-4.1 Mini	$0.4	$1.60	1.05m
Mistral NeMo Instruct (2407)	$0.45	$0.45	128k
Gemini 3 Flash	$0.5	$3.00	1m
GPT-3.5 Turbo	$0.5	$1.50	16k
Kimi K2.5	$0.6	$3.00	256k
Qwen2.5-32B-Instruct	$0.6	$0.6	128k
Arctic	$0.65	$2.75	4k
CodeLlama 70B	$0.65	$2.75	16k
CodeLlama 70B Python	$0.65	$2.75	16k
Falcon 40B	$0.65	$2.75	—
Llama 2 70B Chat	$0.65	$2.75	4k
Llama 3 70B	$0.65	$2.75	8k
Llama 3 70B Instruct	$0.65	$2.75	8k
Qwen1.5-72B	$0.65	$2.75	32k
DeepSeek V3.1	$0.672	$2.02	64k
Claude 3.5 Haiku	$1.00	$5.00	200k
Claude Haiku 4.5	$1.00	$5.00	200k
o4-mini	$1.00	$4.00	200k
o1-mini (09-12)	$1.10	$4.40	128k
Llama 3.1-70B	$1.20	$1.20	128k
GPT-5	$1.25	$10.00	400k
Qwen2.5-72B-Instruct	$1.30	$1.30	128k
DeepSeek V3	$1.45	$1.45	64k
GPT-5.2	$1.75	$14.00	400k
Gemini 3 Pro	$2.00	$12.00	1m
Gemini 3.1 Pro Preview	$2.00	$12.00	1m
Mixtral 8x22B Instruct v0.3	$2.00	$2.00	64k
Mixtral 8x22B Instruct v0.1	$2.10	$2.10	64k
GPT-4o	$2.50	$10.00	128k
GPT-4o (05-13)	$2.50	$10.00	128k
Claude 3.5 Sonnet	$3.00	$15.00	200k
Claude 3.7 Sonnet	$3.00	$15.00	200k
Claude 4 Sonnet	$3.00	$15.00	200k
Claude Sonnet 4.5	$3.00	$15.00	200k
DeepSeek R1	$3.75	$10.00	128k
Llama 3.1-405B	$3.75	$3.75	128k
GPT-4 Turbo	$5.00	$15.00	128k
Grok 4	$7.20	$36.00	256k
Claude 3 Opus	$7.50	$37.50	200k
o1 (12-17)	$15.00	$60.00	128k
Flan-T5 XL	—	—	512
GPT-JT 6B V1	—	—	—
LLaMA 7B	—	—	2k
LLaVA 13B	—	—	4k
Mamba 1.4B	—	—	2k
Mamba 130M	—	—	2k
Mamba 2.8B	—	—	2k
Mamba 370M	—	—	2k
Mamba 790M	—	—	2k
OLMo 7B	—	—	—
Open-Assistant SFT-1 12B	—	—	2k
Replit Code	—	—	4k
StableLM Tuned Alpha 7B	—	—	4k

Where else to run this

Llama Guard 2 8B on Replicate API

Provider setup and pricing

Llama Guard 3 8B on Replicate API

Provider setup and pricing

Llama 2 7B Chat on Replicate API

Provider setup and pricing

Llama Guard 2 8B on Fireworks AI

Alternative host

Llama Guard 3 8B on Cloudflare Workers AI

Alternative host

Llama 2 7B Chat on Alibaba Cloud PAI-EAS

Alternative host

Pricing Overview

Cheapest$0.03/1M

Most expensive$15.00/1M

About Replicate API

Replicate offers a cloud-based AI platform that simplifies the deployment and integration of machine learning models. The platform provides an extensive library of open-source models that users can run with minimal coding, enabling easy access to advanced AI functionalities such as text generation, image creation, and video production. With automatic API generation, users can effortlessly deploy custom models on a large GPU cluster. The platform also supports the "Cog" tool, which packages models into production-ready containers, streamlining the management and scaling of AI applications. The platform's scalability is a key feature, automatically adjusting resources based on demand to ensure optimal performance during peak usage times. Users benefit from a cost-effective pricing model, paying only for the active time their code runs. Replicate fosters collaboration by allowing users to share their models publicly or keep them private, promoting innovation and knowledge sharing within the developer community. The platform's focus on accessibility and ease of use makes it an ideal solution for developers looking to integrate AI into their projects without the complexities typically associated with machine learning.

Full provider profile →

Links

Dashboard Documentation Pricing