LLM ReferenceLLM Reference

Cheapest LLM APIs You Can Call Right Now (2026)

Last refreshed 2026-05-18. Next refresh: weekly.

Cheapest LLM APIs you can call right now, ranked by strict lowest tracked input price with an MMLU or GPQA quality watermark beside each row.

Use this page when token price is the first constraint and quality still matters. The rows below exclude zero-dollar tiers and surface a quality watermark beside tracked input prices.

Need no-cost options? Compare the free-model leaderboard separately from paid API pricing.

Top three picks

Opinionated short stack for this category — scroll for the full leaderboard, pricing, and compare links.

How we rank

Cheapest LLM APIs stay a strict price board, with a quality watermark so low-cost rows do not hide weak benchmark coverage.

  1. EligibilityChat-completion style APIs with positive priced tiers (we exclude zero-dollar rows here because `/best/free` covers that intent).
  2. Primary rankingAscending input $/1M tokens using consolidated provider data.
  3. Quality watermarkRows show the first available MMLU or GPQA Diamond score as a capability check, but the score does not change cheap-page order.
  4. Variant collapseWe keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell.
  5. Coverage caveatUltra-cheap models may lack frontier benchmarks — pair this list with `/best/coding` or another task board for quality guardrails.
#ModelInput $/1MOutput $/1M
1Llama 3.1 8B Instruct

Quality watermark:

$0.02$0.05
2Mistral NeMo Instruct (2407)

Quality watermark: MMLU 81.5%

$0.02$0.04
3Aleph Alpha Luminous Base

Quality watermark:

$0.02$0.06
4Gemma 3n 4B (free)

Quality watermark:

$0.02$0.04
5Together AI - Gemma 3n-e4B
Tools

Quality watermark:

$0.02$0.04
6Llama 3.2 1B Instruct

Quality watermark: MMLU 49.3%

$0.03$0.10
7gpt-oss-20b
Tools

Quality watermark: GPQA Diamond 68.8%

$0.03$0.14
8Llama 3 8B Instruct

Quality watermark: MMLU 76.9%

$0.03$0.04
9Qwen2.5-7B-Instruct

Quality watermark: MMLU 81.2%

$0.03$0.03
10Granite 3.3 8B Instruct
Tools

Quality watermark:

$0.03$0.25
11LFM2-24B-A2B
Tools

Quality watermark:

$0.03$0.12
12ERNIE Lite Pro

Quality watermark:

$0.03$0.06
13Nova Micro

Quality watermark:

$0.04$0.14
14Gemini 1.5 Flash on Google Vertex AI
Vision

Quality watermark:

$0.04$0.10
15Gemini 1.5 Flash 8B

Quality watermark:

$0.04$0.15
16gpt-oss-120b
Tools

Quality watermark: GPQA Diamond 78.2%

$0.04$0.18
17Aleph Alpha Luminous Extended

Quality watermark:

$0.04$0.12
18Qwen3-9B

Quality watermark:

$0.04$0.20
19Nemotron-Nano-9B-v2

Quality watermark:

$0.04$0.16
20ERNIE Speed Pro

Quality watermark:

$0.04$0.09

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

  • Efficient 4B parameter model from Google, available on Together AI. Gemma 3 nano-edge model optimized for low-latency inference.

    $0.020

    Input $/1M

  • Llama 3.2 1B Instruct available on AWS Bedrock

    $0.027

    Input $/1M

  • OpenAI open-weight model with 20 billion parameters. Lightweight, efficient text-only model with reasoning and function calling. Free for self-hosting. Released August 5, 2025.

    $0.030

    Input $/1M