Cloudflare Workers AI Models — Pricing & Benchmarks
20 models available · Cloudflare
Cloudflare Workers AI hosts 20 AI models in this catalog. Per-token pricing is not listed for these Cloudflare Workers AI rows yet; compare context windows, benchmarks, and hosting options instead. LLM Reference lets you compare these models across all 63 providers without switching tabs.
| Model | Input (per 1M) | Output (per 1M) | Context | |
|---|---|---|---|---|
| DeepSeek Coder 6.7B | — | — | 4K | |
| DeepSeek Math 7B | — | — | — | |
| Falcon 7B | — | — | — | |
| Gemma 2B Instruct | — | — | 2K | |
| Gemma 7B Instruct | — | — | 8K | |
| Hermes 2 Pro Mistral 7B | — | — | 32K | |
| Llama 2 13B Chat | — | — | 4K | |
| Llama 2 7B Chat | — | — | 4K | |
| Llama 3 8B Instruct | — | — | 8K | |
| Llama Guard 7B | — | — | 2K | |
| Mistral 7B v0.1 | — | — | 8K | |
| OpenChat 3.5 (0106) | — | — | 8K | |
| OpenHermes 2.5 Mistral 7B | — | — | 32K | |
| Phi-2 | — | — | — | |
| Qwen1.5-0.5B | — | — | — | |
| Qwen1.5-1.8B | — | — | — | |
| Qwen1.5-14B | — | — | — | |
| Qwen1.5-7B | — | — | — | |
| SQLCoder 7B 2 | — | — | — | |
| Starling LM 7B Beta | — | — | — |
About Cloudflare Workers AI
Cloudflare Workers AI is a serverless GPU inference platform enabling developers to run machine learning models on Cloudflare's global edge network. It supports diverse AI tasks including text generation, image classification, automatic speech recognition, and real-time language translation. The platform provides pay-per-use pricing and access to a curated library of open-source models from Hugging Face, enabling rapid deployment without complex infrastructure management. Key features include low-latency edge computing, streaming responses for large language models, context length customization, and the AI Gateway for monitoring, caching, and cost optimization.
Full provider profile →