Which Cloudflare Workers AI model is cheapest?

The cheapest Cloudflare Workers AI model in this catalog is Llama 3.2 1B Instruct at $0.027/1M input tokens.

What is the context window for Cloudflare Workers AI models?

Cloudflare Workers AI models listed here range from 512 to 10m tokens of context.

How does Cloudflare Workers AI compare to Fireworks AI?

Cloudflare Workers AI lists 40 models here, while Fireworks AI lists 224. Compare pricing availability, context windows, and benchmark coverage before choosing a host.

Cloudflare Workers AI Models — Pricing & Benchmarks

40 models available · Cloudflare

Cloudflare Workers AI hosts 40 AI models in this catalog. The lowest listed input price is Llama 3.2 1B Instruct at $0.027/1M input tokens. LLM Reference lets you compare these models across all 80 providers without switching tabs.

Model	Input (per 1M)	Output (per 1M)	Context
Llama 3.2 1B Instruct	$0.027	$0.201	128k
Llama 3.2 11B Vision Instruct	$0.049	$0.676	128k
Llama 3.2 3B Instruct	$0.051	$0.335	128k
Qwen3-30B-A3B	$0.051	$0.335	128k
GLM-4.7 Flash	$0.06	$0.4	198k
Gemma 4 26B A4B IT	$0.1	$0.3	256k
gpt-oss-20b	$0.2	$0.3	131k
Llama 4 Scout 17B-16E Instruct	$0.27	$0.85	10m
Llama 3.3 70B Instruct (free)	$0.293	$2.25	66k
Gemma 3 12B	$0.345	$0.556	33k
gpt-oss-120b	$0.35	$0.75	131k
Mistral Small 3.1 24B Instruct	$0.351	$0.555	128k
Llama Guard 3 8B	$0.484	$0.03	8k
DeepSeek R1 Distill Qwen-32B	$0.497	$4.88	128k
Qwen2.5-Coder-32B-Instruct	$0.66	$1.00	128k
QwQ 32B	$0.66	$1.00	128k
Kimi K2.6	$0.95	$4.00	262k
BGE Base EN v1.5	—	—	512
BGE Large EN v1.5	—	—	512
BGE M3	—	—	8k
BGE Reranker Base	—	—	512
BGE Small EN v1.5	—	—	512
FLUX.1 [schnell]	—	—	—
Gemma 2B Instruct	—	—	2k
Gemma 7B Instruct	—	—	8k
Granite 4.0 H Micro	—	—	131k
Hermes 2 Pro Mistral 7B	—	—	32k
Kimi K2.5	—	—	256k
Llama 2 7B Chat	—	—	4k
Llama 3 8B Instruct	—	—	8k
Llama 3.1 70B Instruct	—	—	128k
Llama 3.1 8B Instruct	—	—	128k
LLaVA 1.5 7B	—	—	4k
Mistral 7B Instruct v0.2	—	—	32k
Mistral 7B v0.1	—	—	8k
Nemotron 3 Super-120B-A12B	—	—	1.05m
Phi-2	—	—	2k
Qwen3 Embedding 0.6B	—	—	33k
SEA-LION V4 27B Instruct	—	—	128k
SQLCoder 7B 2	—	—	32k

Where else to run this

Llama Guard 3 8B on Cloudflare Workers AI

Provider setup and pricing

Llama 2 7B Chat on Cloudflare Workers AI

Provider setup and pricing

Kimi K2.6 on Cloudflare Workers AI

Provider setup and pricing

Llama Guard 3 8B on Microsoft Foundry

Alternative host

Llama 2 7B Chat on Alibaba Cloud PAI-EAS

Alternative host

Kimi K2.6 on NVIDIA NIM

Alternative host

Pricing Overview

Cheapest$0.03/1M

Most expensive$0.95/1M

About Cloudflare Workers AI

Cloudflare Workers AI is a serverless GPU inference platform enabling developers to run machine learning models on Cloudflare's global edge network. It supports diverse AI tasks including text generation, image classification, automatic speech recognition, and real-time language translation. The platform provides pay-per-use pricing and access to a curated library of open-source models from Hugging Face, enabling rapid deployment without complex infrastructure management. Key features include low-latency edge computing, streaming responses for large language models, context length customization, and the AI Gateway for monitoring, caching, and cost optimization.

Full provider profile →

Links

Dashboard Documentation Pricing