LLM ReferenceLLM Reference
Cloudflare Workers AI

Cloudflare Workers AI Models — Pricing & Benchmarks

20 models available · Cloudflare

Cloudflare Workers AI hosts 20 AI models in this catalog. Per-token pricing is not listed for these Cloudflare Workers AI rows yet; compare context windows, benchmarks, and hosting options instead. LLM Reference lets you compare these models across all 63 providers without switching tabs.

ModelInput (per 1M)Output (per 1M)Context
DeepSeek Coder 6.7B4K
DeepSeek Math 7B
Falcon 7B
Gemma 2B Instruct2K
Gemma 7B Instruct8K
Hermes 2 Pro Mistral 7B32K
Llama 2 13B Chat4K
Llama 2 7B Chat4K
Llama 3 8B Instruct8K
Llama Guard 7B2K
Mistral 7B v0.18K
OpenChat 3.5 (0106)8K
OpenHermes 2.5 Mistral 7B32K
Phi-2
Qwen1.5-0.5B
Qwen1.5-1.8B
Qwen1.5-14B
Qwen1.5-7B
SQLCoder 7B 2
Starling LM 7B Beta

About Cloudflare Workers AI

Cloudflare Workers AI is a serverless GPU inference platform enabling developers to run machine learning models on Cloudflare's global edge network. It supports diverse AI tasks including text generation, image classification, automatic speech recognition, and real-time language translation. The platform provides pay-per-use pricing and access to a curated library of open-source models from Hugging Face, enabling rapid deployment without complex infrastructure management. Key features include low-latency edge computing, streaming responses for large language models, context length customization, and the AI Gateway for monitoring, caching, and cost optimization.

Full provider profile →