What is a token in LLM pricing?

A token is the basic unit that LLMs use to process text — roughly 3–4 characters, or about 0.75 words in English. 1,000 tokens is approximately 750 words.

Why do LLM APIs charge differently for input and output tokens?

Output tokens typically cost 2–5× more than input tokens because generating text is computationally more expensive than reading it. If you generate long responses, output cost will dominate your bill.

What is LLM batch pricing?

Batch APIs process requests asynchronously (typically within 24 hours) at roughly half the standard price. Anthropic, OpenAI, and Google all offer ~50% batch discounts. Use batch mode for offline pipelines like dataset labeling or bulk summarization.

What is prompt caching and how does it reduce LLM costs?

Prompt caching (offered by Anthropic and Google) charges a fraction of the normal input price for repeated prefixes across calls. This is significant for RAG applications with large static context or repeated system prompts.

LLM API Pricing Explained: How to Compare Cost Per Token

LLM API pricing can be confusing at first. Providers charge in tokens, not words or requests — and the numbers look very small ($0.000003 per token) until you're running thousands of queries per day. This guide explains how LLM pricing works and how to compare APIs so you pick the one that fits your budget.

What is a token?

A token is the basic unit that LLMs use to process text — roughly 3–4 characters on average, or about 0.75 words in English. Common words like "the", "is", and "in" are usually a single token. Longer or unusual words may be multiple tokens.

Quick estimates:

1,000 tokens ≈ 750 words ≈ 1.5 pages of text

10,000 tokens ≈ 7,500 words ≈ a short story

1M tokens ≈ 750,000 words ≈ about 6 average novels

For budgeting purposes, the rough word-count conversion above is usually close enough.

Input tokens vs output tokens

Most providers charge differently for input (what you send) and output (what the model generates):

Input tokens — your prompt, the conversation history, and any documents you include in the context
Output tokens — the model's response

Output tokens typically cost 2–5× more than input tokens. This matters if you're generating long responses (code, essays, summaries), because the output cost dominates the bill.

Example: GPT-4o at $2.50/M input and $10.00/M output

If you send 1,000 tokens of input and get 500 tokens of output:

Input cost: 1,000 × $0.0000025 = $0.0025
Output cost: 500 × $0.00001 = $0.005
Total per call: $0.0075

At 10,000 calls/day, that's $75/day or about $2,275/month.

How to compare prices: the price-per-1M convention

Because per-token prices are tiny numbers, the industry convention is to quote price per million (1M) tokens:

Provider	Model	Input ($/1M)	Output ($/1M)
OpenAI	GPT-4o	$2.50	$10.00
Anthropic	Claude Sonnet	$3.00	$15.00
Google	Gemini 2.5 Flash	$0.30	$1.00
DeepSeek	DeepSeek V3	$0.27	$1.10
Meta (via OpenRouter)	Llama 3.1 405B	$0.80	$0.80

Prices change frequently. See live pricing at LLMReference /best/cheap.

Batch pricing: cheaper for non-urgent work

Many providers offer batch APIs that process requests asynchronously (typically within 24 hours) at roughly half the standard price. If you're running large one-time jobs — dataset labeling, document classification, bulk summarization — batch mode can cut your costs in half.

Provider	Batch discount
Anthropic (Message Batches)	50% off
OpenAI (Batch API)	50% off
Google (Batch requests)	Up to 50% off

Batch mode is not suitable for real-time interactive applications (chatbots, copilots, live APIs). Use it for offline pipelines.

Hidden cost factors

Context window inflation: Sending the same long system prompt on every call? Those tokens are billed every time. A 2,000-token system prompt at 10,000 calls/day adds 20M tokens/day in input cost. Use caching where available.

Prompt caching: Some providers (Anthropic, Google) offer prompt cache discounts — if your input starts with the same prefix across many calls, the cached portion is billed at a fraction of the normal input price. This is significant for RAG applications with large static context.

Retry and error costs: Failed API calls that still incur input token charges (because the tokens were processed before the failure) are easy to overlook in cost models.

Rate limit costs: Higher rate limits often require higher-tier subscriptions with fixed monthly fees on top of usage costs.

How to choose by price

"I want the cheapest possible model that still works for my use case"
→ Start at /best/cheap — sorted strictly by lowest input price with a quality watermark beside each pick.

"I want the best capability-per-dollar"
→ Start at /best/api — sorted by benchmark quality first, then lowest price.

"I need a free tier to prototype before committing"
→ Start at /best/free — zero-cost hosted tiers first, then self-hostable open-weight options.

"I need to compare two specific models on price"
→ Use /compare — pick any two models and see a side-by-side pricing breakdown.

A quick cost estimate template

Use this formula to estimate monthly API spend before committing:

Monthly cost = (calls/day × 30) × (avg_input_tokens × input_price_per_M / 1,000,000) + (calls/day × 30) × (avg_output_tokens × output_price_per_M / 1,000,000)

Example: 5,000 calls/day, 1,500 avg input tokens, 800 avg output tokens, $0.30/M input, $1.00/M output

Input: 5,000 × 30 × 1,500 × 0.30 / 1,000,000 = $67.50/month
Output: 5,000 × 30 × 800 × 1.00 / 1,000,000 = $120/month
Total: ~$187.50/month

At $1/M input and $5/M output (mid-tier model), the same workload would cost about $900/month. The difference is real — worth modeling before you build.

Summary: what to remember

Prices are per million tokens, not per request.
Output tokens cost more than input tokens — typically 2–5×.
Batch mode cuts costs in half for non-real-time workloads.
Prompt caching can dramatically reduce costs for RAG and long-system-prompt applications.
The cheapest model isn't always the best value — see the quality watermark on /best/cheap.

→ See all LLM prices, updated daily: LLM API pricing comparison

Previous: How to Choose an LLM All guides