LLM API Pricing Explained: How to Compare Cost Per Token
LLM API pricing can be confusing at first. Providers charge in tokens, not words or requests — and the numbers look very small ($0.000003 per token) until you're running thousands of queries per day. This guide explains how LLM pricing works and how to compare APIs so you pick the one that fits your budget.
What is a token?
A token is the basic unit that LLMs use to process text — roughly 3–4 characters on average, or about 0.75 words in English. Common words like "the", "is", and "in" are usually a single token. Longer or unusual words may be multiple tokens.
Quick estimates:
1,000 tokens ≈ 750 words ≈ 1.5 pages of text
10,000 tokens ≈ 7,500 words ≈ a short story
1M tokens ≈ 750,000 words ≈ about 6 average novels
For budgeting purposes, the rough word-count conversion above is usually close enough.
Input tokens vs output tokens
Most providers charge differently for input (what you send) and output (what the model generates):
- Input tokens — your prompt, the conversation history, and any documents you include in the context
- Output tokens — the model's response
Output tokens typically cost 2–5× more than input tokens. This matters if you're generating long responses (code, essays, summaries), because the output cost dominates the bill.
Example: GPT-4o at $2.50/M input and $10.00/M output
If you send 1,000 tokens of input and get 500 tokens of output:
- Input cost: 1,000 × $0.0000025 = $0.0025
- Output cost: 500 × $0.00001 = $0.005
- Total per call: $0.0075
At 10,000 calls/day, that's $75/day or about $2,275/month.
How to compare prices: the price-per-1M convention
Because per-token prices are tiny numbers, the industry convention is to quote price per million (1M) tokens:
| Provider | Model | Input ($/1M) | Output ($/1M) |
|---|---|---|---|
| OpenAI | GPT-4o | $2.50 | $10.00 |
| Anthropic | Claude Sonnet | $3.00 | $15.00 |
| Gemini 2.5 Flash | $0.30 | $1.00 | |
| DeepSeek | DeepSeek V3 | $0.27 | $1.10 |
| Meta (via OpenRouter) | Llama 3.1 405B | $0.80 | $0.80 |
Prices change frequently. See live pricing at LLMReference /best/cheap.
Batch pricing: cheaper for non-urgent work
Many providers offer batch APIs that process requests asynchronously (typically within 24 hours) at roughly half the standard price. If you're running large one-time jobs — dataset labeling, document classification, bulk summarization — batch mode can cut your costs in half.
| Provider | Batch discount |
|---|---|
| Anthropic (Message Batches) | 50% off |
| OpenAI (Batch API) | 50% off |
| Google (Batch requests) | Up to 50% off |
Batch mode is not suitable for real-time interactive applications (chatbots, copilots, live APIs). Use it for offline pipelines.
Hidden cost factors
Context window inflation: Sending the same long system prompt on every call? Those tokens are billed every time. A 2,000-token system prompt at 10,000 calls/day adds 20M tokens/day in input cost. Use caching where available.
Prompt caching: Some providers (Anthropic, Google) offer prompt cache discounts — if your input starts with the same prefix across many calls, the cached portion is billed at a fraction of the normal input price. This is significant for RAG applications with large static context.
Retry and error costs: Failed API calls that still incur input token charges (because the tokens were processed before the failure) are easy to overlook in cost models.
Rate limit costs: Higher rate limits often require higher-tier subscriptions with fixed monthly fees on top of usage costs.
How to choose by price
"I want the cheapest possible model that still works for my use case"
→ Start at /best/cheap — sorted strictly by lowest input price with a quality watermark beside each pick.
"I want the best capability-per-dollar"
→ Start at /best/api — sorted by benchmark quality first, then lowest price.
"I need a free tier to prototype before committing"
→ Start at /best/free — zero-cost hosted tiers first, then self-hostable open-weight options.
"I need to compare two specific models on price"
→ Use /compare — pick any two models and see a side-by-side pricing breakdown.
A quick cost estimate template
Use this formula to estimate monthly API spend before committing:
Example: 5,000 calls/day, 1,500 avg input tokens, 800 avg output tokens, $0.30/M input, $1.00/M output
- Input: 5,000 × 30 × 1,500 × 0.30 / 1,000,000 = $67.50/month
- Output: 5,000 × 30 × 800 × 1.00 / 1,000,000 = $120/month
- Total: ~$187.50/month
At $1/M input and $5/M output (mid-tier model), the same workload would cost about $900/month. The difference is real — worth modeling before you build.
Summary: what to remember
- Prices are per million tokens, not per request.
- Output tokens cost more than input tokens — typically 2–5×.
- Batch mode cuts costs in half for non-real-time workloads.
- Prompt caching can dramatically reduce costs for RAG and long-system-prompt applications.
- The cheapest model isn't always the best value — see the quality watermark on /best/cheap.
→ See all LLM prices, updated daily: LLM API pricing comparison