LLM Reference
Replicate API

Models on Replicate API

114 models available · Replicate

ModelInput (per 1M)Output (per 1M)Context
Granite 3.3 8B Instruct$0.03$0.25128K
CodeLlama 7B$0.05$0.25100K
CodeLlama 7B Python$0.05$0.25100K
DeepSeek Math 7B$0.05$0.25
DeepSeek VL 7B$0.05$0.25
Gemma 2B Instruct$0.05$0.252K
Gemma 7B Instruct$0.05$0.258K
GLM-4V 9B$0.05$0.25
Hermes 2 Theta Llama 3 8B$0.05$0.25
Llama 2 7B Chat$0.05$0.254K
Llama 3 8B$0.05$0.258K
Llama 3 8B Instruct$0.05$0.258K
Llama Guard 2 8B$0.05$0.258K
LLaVA 1.6 Mistral 7B$0.05$0.2532K
LLaVA 1.6 Vicuna 7B$0.05$0.25
Mistral 7B v0.1$0.05$0.258K
Phi-2$0.05$0.25
Phi-3 Mini 128K$0.05$0.25128K
Phi-3 Mini 4k$0.05$0.254K
Qwen VL$0.05$0.2532K
Qwen-7B$0.05$0.258K
Qwen1.5-0.5B$0.05$0.25
Qwen1.5-1.8B$0.05$0.25
Qwen1.5-4B$0.05$0.25
Qwen1.5-7B$0.05$0.25
Stable LM 3B$0.05$0.25
Stable LM 7B$0.05$0.25
Yi 6B$0.05$0.25200K
Zephyr 7B Alpha$0.05$0.25
Zephyr 7B Beta$0.05$0.25
gpt-oss-20b$0.09$0.36131072
CodeLlama 13B$0.10$0.50100K
CodeLlama 13B Python$0.10$0.50100K
Dolly 2.0 12B$0.10$0.50
Gemma 2 9B Instruct$0.1$0.18K
GPT-4.1 Nano$0.1$0.41M
Llama 2 13B Chat$0.10$0.504K
LLaVA 1.6 Vicuna 13B$0.10$0.50
Nous Hermes 2 SOLAR 10.7B$0.10$0.50
Nous Hermes Llama 2 13B$0.10$0.50
Qwen-14B$0.10$0.5032K
Qwen1.5-14B$0.10$0.50
Vicuna 13B$0.10$0.502K
GPT-4o Mini (07-18)$0.15$0.6128K
gpt-oss-120b$0.18$0.72131072
CodeLlama 34B$0.20$1.00100K
CodeLlama 34B Python$0.20$1.00100K
Llama Guard 4 12B$0.2$0.2164K
Mixtral 8x7B$0.20$1.0032K
Nous Hermes 2 Yi 34B$0.20$1.00200K
Qwen1.5-32B$0.20$1.00
WizardCoder 33B$0.20$1.00
WizardCoder Python 34B$0.20$1.00
Yi 34B$0.20$1.00200K
Yi 34B 200K$0.20$1.00200K
Yi VL 34B$0.20$1.00
Claude 3 Haiku$0.25$1.25200K
Llama 3.1 8B Instruct$0.25$0.25128K
Mistral 7B Instruct v0.2$0.25$0.2532K
Gemini 2.5 Flash$0.3$2.51M
Llama Guard 3 8B$0.3$0.38K
Gemma 2 27B Instruct$0.4$0.48K
GPT-4.1 Mini$0.4$1.61M
Mistral NeMo Instruct (2407)$0.45$0.45128K
Gemini 3 Flash$0.5$31M
GPT-3.5 Turbo$0.5$1.516K
Kimi K2.5$0.60$3.00256K
Qwen2.5 32B Instruct$0.6$0.6128K
Arctic$0.65$2.754K
CodeLlama 70B$0.65$2.7516K
CodeLlama 70B Python$0.65$2.7516K
Falcon 40B$0.65$2.75
Llama 2 70B Chat$0.65$2.754K
Llama 3 70B$0.65$2.758K
Llama 3 70B Instruct$0.65$2.758K
Qwen1.5-72B$0.65$2.75
DeepSeek V3.1$0.672$2.01664K
Claude 3.5 Haiku$1$5
Claude Haiku 4.5$1$5200k
o4-mini$1$4
o1-mini (09-12)$1.1$4.4128K
Llama 3.1-70B$1.2$1.2128k
Qwen2.5 72B Instruct$1.3$1.3128K
DeepSeek V3$1.45$1.4564k
Gemini 3 Pro$2$121M
Gemini 3.1 Pro$2$121M
GPT-4.1$2$81M
Mixtral 8x22B Instruct v0.3$2$264K
Mixtral 8x22B Instruct v0.1$2.1$2.164K
GPT-4o (05-13)$2.5$10128K
Claude 3.5 Sonnet$3$15200K
Claude 4 Sonnet$3$15200K
Claude Sonnet 4.5$3$15200K
Llama 3.1-405B$3.75$3.75128k
Claude Opus 4.6$5$251M
GPT-4 Turbo$5$15128K
Grok 4$7.2$36
Claude 3 Opus$7.5$37.5200K
o1 (12-17)$15$60128K
Claude 3.7 Sonnet$300$1500200K
DeepSeek R1$375$1000128K
Flan-T5 XL
GPT-JT 6B V1
LLaMA 7B2K
LLaVA 13B4K
Mamba 1.4B
Mamba 130M
Mamba 2.8B
Mamba 370M
Mamba 790M
OLMo 7B
Open-Assistant SFT-1 12B
Replit Code
StableLM Tuned Alpha 7B4K

Pricing Overview

Cheapest$0.03/1M
Most expensive$375.00/1M

About Replicate API

Replicate offers a cloud-based AI platform that simplifies the deployment and integration of machine learning models. The platform provides an extensive library of open-source models that users can run with minimal coding, enabling easy access to advanced AI functionalities such as text generation, image creation, and video production. With automatic API generation, users can effortlessly deploy custom models on a large GPU cluster. The platform also supports the "Cog" tool, which packages models into production-ready containers, streamlining the management and scaling of AI applications. The platform's scalability is a key feature, automatically adjusting resources based on demand to ensure optimal performance during peak usage times. Users benefit from a cost-effective pricing model, paying only for the active time their code runs. Replicate fosters collaboration by allowing users to share their models publicly or keep them private, promoting innovation and knowledge sharing within the developer community. The platform's focus on accessibility and ease of use makes it an ideal solution for developers looking to integrate AI into their projects without the complexities typically associated with machine learning.

Full provider profile →