LLM Reference
Cloudflare Workers AI

Cloudflare Workers AI Models — Pricing & Benchmarks

40 models available · Cloudflare

Cloudflare Workers AI hosts 40 AI models in this catalog. The lowest listed input price is Llama 3.2 1B Instruct at $0.027/1M input tokens. LLM Reference lets you compare these models across all 80 providers without switching tabs.

ModelInput (per 1M)Output (per 1M)Context
Llama 3.2 1B Instruct$0.027$0.201128k
Llama 3.2 11B Vision Instruct$0.049$0.676128k
Llama 3.2 3B Instruct$0.051$0.335128k
Qwen3-30B-A3B$0.051$0.335128k
GLM-4.7 Flash$0.06$0.4198k
Gemma 4 26B A4B IT$0.1$0.3256k
gpt-oss-20b$0.2$0.3131k
Llama 4 Scout 17B-16E Instruct$0.27$0.8510m
Llama 3.3 70B Instruct (free)$0.293$2.2566k
Gemma 3 12B$0.345$0.55633k
gpt-oss-120b$0.35$0.75131k
Mistral Small 3.1 24B Instruct$0.351$0.555128k
Llama Guard 3 8B$0.484$0.038k
DeepSeek R1 Distill Qwen-32B$0.497$4.88128k
Qwen2.5-Coder-32B-Instruct$0.66$1.00128k
QwQ 32B$0.66$1.00128k
Kimi K2.6$0.95$4.00262k
BGE Base EN v1.5512
BGE Large EN v1.5512
BGE M38k
BGE Reranker Base512
BGE Small EN v1.5512
FLUX.1 [schnell]
Gemma 2B Instruct2k
Gemma 7B Instruct8k
Granite 4.0 H Micro131k
Hermes 2 Pro Mistral 7B32k
Kimi K2.5256k
Llama 2 7B Chat4k
Llama 3 8B Instruct8k
Llama 3.1 70B Instruct128k
Llama 3.1 8B Instruct128k
LLaVA 1.5 7B4k
Mistral 7B Instruct v0.232k
Mistral 7B v0.18k
Nemotron 3 Super-120B-A12B1.05m
Phi-22k
Qwen3 Embedding 0.6B33k
SEA-LION V4 27B Instruct128k
SQLCoder 7B 232k

Where else to run this

Pricing Overview

Cheapest$0.03/1M
Most expensive$0.95/1M

About Cloudflare Workers AI

Cloudflare Workers AI is a serverless GPU inference platform enabling developers to run machine learning models on Cloudflare's global edge network. It supports diverse AI tasks including text generation, image classification, automatic speech recognition, and real-time language translation. The platform provides pay-per-use pricing and access to a curated library of open-source models from Hugging Face, enabling rapid deployment without complex infrastructure management. Key features include low-latency edge computing, streaming responses for large language models, context length customization, and the AI Gateway for monitoring, caching, and cost optimization.

Full provider profile →