LLM Reference
Replicate API

Replicate API Models — Pricing & Benchmarks

117 models available · Replicate

Replicate API hosts 117 AI models in this catalog. The lowest listed input price is Granite 3.3 8B Instruct at $0.03/1M input tokens. LLM Reference lets you compare these models across all 80 providers without switching tabs.

ModelInput (per 1M)Output (per 1M)Context
Granite 3.3 8B Instruct$0.03$0.25128k
CodeLlama 7B$0.05$0.25100k
CodeLlama 7B Python$0.05$0.25100k
DeepSeek Math 7B$0.05$0.25
DeepSeek VL 7B$0.05$0.25
Gemma 2B Instruct$0.05$0.252k
Gemma 7B Instruct$0.05$0.258k
GLM-4V 9B$0.05$0.25131k
GPT-5 Nano$0.05$0.4400k
Hermes 2 Theta Llama 3 8B$0.05$0.258k
Llama 2 7B Chat$0.05$0.254k
Llama 3 8B$0.05$0.258k
Llama 3 8B Instruct$0.05$0.258k
Llama Guard 2 8B$0.05$0.258k
LLaVA 1.6 Mistral 7B$0.05$0.2532k
LLaVA 1.6 Vicuna 7B$0.05$0.254k
Mistral 7B v0.1$0.05$0.258k
Phi-2$0.05$0.252k
Phi-3 Mini 128K$0.05$0.25128k
Phi-3 Mini 4k$0.05$0.254k
Qwen-7B$0.05$0.258k
Qwen-VL$0.05$0.2532k
Qwen1.5-0.5B$0.05$0.2532k
Qwen1.5-1.8B$0.05$0.2532k
Qwen1.5-4B$0.05$0.2532k
Qwen1.5-7B$0.05$0.2532k
Stable LM 3B$0.05$0.25
Stable LM 7B$0.05$0.25
Yi 6B$0.05$0.25200k
Zephyr 7B Alpha$0.05$0.25
Zephyr 7B Beta$0.05$0.25
gpt-oss-20b$0.09$0.36131k
CodeLlama 13B$0.1$0.5100k
CodeLlama 13B Python$0.1$0.5100k
Dolly 2.0 12B$0.1$0.5
Gemma 2 9B Instruct$0.1$0.18k
GPT-4.1 Nano$0.1$0.41.05m
Llama 2 13B Chat$0.1$0.54k
LLaVA 1.6 Vicuna 13B$0.1$0.54k
Nous Hermes 2 SOLAR 10.7B$0.1$0.54k
Nous Hermes Llama 2 13B$0.1$0.5
Qwen-14B$0.1$0.532k
Qwen1.5-14B$0.1$0.532k
Vicuna 13B$0.1$0.52k
GPT-4o Mini (07-18)$0.15$0.6128k
gpt-oss-120b$0.18$0.72131k
CodeLlama 34B$0.2$1.00100k
CodeLlama 34B Python$0.2$1.00100k
Llama Guard 4 12B$0.2$0.2164k
Mixtral 8x7B$0.2$1.0032k
Nous Hermes 2 Yi 34B$0.2$1.00200k
Qwen1.5-32B$0.2$1.0032k
WizardCoder 33B$0.2$1.0016k
WizardCoder Python 34B$0.2$1.00100k
Yi 34B$0.2$1.00200k
Yi 34B 200K$0.2$1.00200k
Yi VL 34B$0.2$1.00131k
Claude 3 Haiku$0.25$1.25200k
GPT-5 Mini$0.25$2.00400k
Llama 3.1 8B Instruct$0.25$0.25128k
Mistral 7B Instruct v0.2$0.25$0.2532k
Gemini 2.5 Flash$0.3$2.501m
Llama Guard 3 8B$0.3$0.38k
Gemma 2 27B Instruct$0.4$0.48k
GPT-4.1 Mini$0.4$1.601.05m
Mistral NeMo Instruct (2407)$0.45$0.45128k
Gemini 3 Flash$0.5$3.001m
GPT-3.5 Turbo$0.5$1.5016k
Kimi K2.5$0.6$3.00256k
Qwen2.5-32B-Instruct$0.6$0.6128k
Arctic$0.65$2.754k
CodeLlama 70B$0.65$2.7516k
CodeLlama 70B Python$0.65$2.7516k
Falcon 40B$0.65$2.75
Llama 2 70B Chat$0.65$2.754k
Llama 3 70B$0.65$2.758k
Llama 3 70B Instruct$0.65$2.758k
Qwen1.5-72B$0.65$2.7532k
DeepSeek V3.1$0.672$2.0264k
Claude 3.5 Haiku$1.00$5.00200k
Claude Haiku 4.5$1.00$5.00200k
o4-mini$1.00$4.00200k
o1-mini (09-12)$1.10$4.40128k
Llama 3.1-70B$1.20$1.20128k
GPT-5$1.25$10.00400k
Qwen2.5-72B-Instruct$1.30$1.30128k
DeepSeek V3$1.45$1.4564k
GPT-5.2$1.75$14.00400k
Gemini 3 Pro$2.00$12.001m
Gemini 3.1 Pro Preview$2.00$12.001m
Mixtral 8x22B Instruct v0.3$2.00$2.0064k
Mixtral 8x22B Instruct v0.1$2.10$2.1064k
GPT-4o$2.50$10.00128k
GPT-4o (05-13)$2.50$10.00128k
Claude 3.5 Sonnet$3.00$15.00200k
Claude 3.7 Sonnet$3.00$15.00200k
Claude 4 Sonnet$3.00$15.00200k
Claude Sonnet 4.5$3.00$15.00200k
DeepSeek R1$3.75$10.00128k
Llama 3.1-405B$3.75$3.75128k
GPT-4 Turbo$5.00$15.00128k
Grok 4$7.20$36.00256k
Claude 3 Opus$7.50$37.50200k
o1 (12-17)$15.00$60.00128k
Flan-T5 XL512
GPT-JT 6B V1
LLaMA 7B2k
LLaVA 13B4k
Mamba 1.4B2k
Mamba 130M2k
Mamba 2.8B2k
Mamba 370M2k
Mamba 790M2k
OLMo 7B
Open-Assistant SFT-1 12B2k
Replit Code4k
StableLM Tuned Alpha 7B4k

Where else to run this

Pricing Overview

Cheapest$0.03/1M
Most expensive$15.00/1M

About Replicate API

Replicate offers a cloud-based AI platform that simplifies the deployment and integration of machine learning models. The platform provides an extensive library of open-source models that users can run with minimal coding, enabling easy access to advanced AI functionalities such as text generation, image creation, and video production. With automatic API generation, users can effortlessly deploy custom models on a large GPU cluster. The platform also supports the "Cog" tool, which packages models into production-ready containers, streamlining the management and scaling of AI applications. The platform's scalability is a key feature, automatically adjusting resources based on demand to ensure optimal performance during peak usage times. Users benefit from a cost-effective pricing model, paying only for the active time their code runs. Replicate fosters collaboration by allowing users to share their models publicly or keep them private, promoting innovation and knowledge sharing within the developer community. The platform's focus on accessibility and ease of use makes it an ideal solution for developers looking to integrate AI into their projects without the complexities typically associated with machine learning.

Full provider profile →