LLM Reference

Cheapest LLM APIs You Can Call Right Now (2026)

Last refreshed 2026-06-30. Next refresh: weekly.

The cheapest LLM APIs you can call today, ranked by input price with a quality score beside each so you see the trade-off.

Use this page when token price is the first constraint and quality still matters. The rows below exclude zero-dollar tiers and surface a quality watermark beside tracked input prices.

Need no-cost options? Compare the free-model leaderboard separately from paid API pricing.

Verdict

Use Ling-2.6-Flash for low-cost API calls today.

Mistral NeMo Instruct (2407) is the runner-up: $0.010 vs $0.020 on Input $/1M.

Researched 44d agoWhy this pickMethodology

How we rank

Cheapest LLM APIs stay a strict price board, with a quality watermark so low-cost rows do not hide weak benchmark coverage.

  1. EligibilityChat-completion style APIs with positive priced tiers (we exclude zero-dollar rows here because `/best/free` covers that intent).
  2. Primary rankingAscending input $/1M tokens using consolidated provider data.
  3. Quality watermarkRows show the first available MMLU or GPQA Diamond score as a capability check, but the score does not change cheap-page order.
  4. Variant collapseWe keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell.
  5. Coverage caveatUltra-cheap models may lack frontier benchmarks — pair this list with `/best/coding` or another task board for quality guardrails.
#ModelInput $/1MOutput $/1M
1Ling-2.6-Flash
Tools

Quality watermark:

$0.01$0.03
2Llama 3 8B Instruct

Quality watermark: MMLU 76.9%

$0.02$0.04
3Llama 3.1 8B Instruct

Quality watermark:

$0.02$0.05
4Mistral NeMo Instruct (2407)

Quality watermark: MMLU 81.5%

$0.02$0.04
5Aleph Alpha Luminous Base

Quality watermark:

$0.02$0.06
6Gemma 3n 4B (free)

Quality watermark:

$0.02$0.04
7Together AI - Gemma 3n-e4B
Tools

Quality watermark:

$0.02$0.04
8Llama 3.2 1B Instruct

Quality watermark: MMLU 49.3%

$0.03$0.10
9Qwen2.5-7B-Instruct

Quality watermark: MMLU 81.2%

$0.03$0.03
10Llama 3.2 3B Instruct

Quality watermark:

$0.03$0.05
11Granite 3.3 8B Instruct
Tools

Quality watermark:

$0.03$0.25
12LFM2-24B-A2B
Tools

Quality watermark:

$0.03$0.12
13ERNIE Lite Pro

Quality watermark:

$0.03$0.06
14KAT Coder Pro V1
Tools

Quality watermark:

$0.03$1.20
15Nova Micro

Quality watermark:

$0.04$0.14
16Gemini 1.5 Flash on Google Vertex AI
Vision

Quality watermark:

$0.04$0.10
17Qwen3-8B

Quality watermark: GPQA Diamond 58.9%

$0.04$0.14
18Amazon Nova Micro

Quality watermark:

$0.04$0.14
19AutoGLM Phone 9B Multilingual
VisionTools

Quality watermark:

$0.04$0.14
20Gemini 1.5 Flash 8B

Quality watermark:

$0.04$0.15

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

  • Google: Gemma 3n 4B (free) available via OpenRouter. Pricing: $null/1M input, $null/1M output.

    $0.020

    Input $/1M

  • Efficient 4B parameter model from Google, available on Together AI. Gemma 3 nano-edge model optimized for low-latency inference.

    $0.020

    Input $/1M

  • Llama 3.2 1B Instruct is Meta's Llama 3.2 model. It offers a 128K-token context window with weights openly available for self-hosting and scores 25.6 on GPQA.

    $0.027

    Input $/1M

Frequently asked questions

Which LLM is best for low-cost API calls?

Ling-2.6-Flash is the current LLMReference top pick for low-cost API calls. The verdict uses the stored category signal Input $/1M: $0.010. Output pricing starts at $0.03 per 1M tokens. Review the linked model and provider pages before production use because availability and pricing can change.

How does Ling-2.6-Flash compare to Mistral NeMo Instruct (2407) for low-cost API calls?

Ling-2.6-Flash leads Mistral NeMo Instruct (2407) in the visible shortlist on Input $/1M: $0.010 versus $0.020. The pricing cards show Ling-2.6-Flash: output pricing starts at $0.03 per 1m tokens and Mistral NeMo Instruct (2407): output pricing starts at $0.04 per 1m tokens.

How does LLMReference rank LLMs for low-cost API calls?

LLMReference ranks LLMs for low-cost API calls from stored model, benchmark, freshness, and pricing data. The current methodology summary is: Cheapest LLM APIs stay a strict price board, with a quality watermark so low-cost rows do not hide weak benchmark coverage.

How often is this list updated?

The LLM rankings on this page are updated daily as new benchmark scores, provider availability, and pricing data are tracked. The "as of" date at the top of the page shows the most recent refresh.

How do you decide which models appear in the top 3?

The podium picks are driven by the primary benchmark signal for this category (shown in the Methodology section), filtered to non-deprecated models with confirmed API availability. In ties, we prefer the more recently released model.

Are preview or beta models included?

Preview models appear in the "Watch list" section but are not in the main ranked podium unless the category explicitly allows it (e.g., /best/coding and /best/agents, where preview models often lead benchmarks).

Can I compare two specific models head-to-head?

Yes — use the Compare tool at llmreference.com/compare for a side-by-side breakdown of context window, pricing, benchmarks, and provider availability.

Is the pricing data real-time?

Pricing is tracked from provider documentation and updated regularly. It reflects the best available public data, not live API quotes — always verify before billing.