Ling-2.6-Flash
- Input $/1M
- $0.010
- Output (from)
- $0.030 / 1M
Last refreshed 2026-06-30. Next refresh: weekly.
The cheapest LLM APIs you can call today, ranked by input price with a quality score beside each so you see the trade-off.
Use this page when token price is the first constraint and quality still matters. The rows below exclude zero-dollar tiers and surface a quality watermark beside tracked input prices.
Need no-cost options? Compare the free-model leaderboard separately from paid API pricing.
Verdict
Mistral NeMo Instruct (2407) is the runner-up: $0.010 vs $0.020 on Input $/1M.
Cheapest LLM APIs stay a strict price board, with a quality watermark so low-cost rows do not hide weak benchmark coverage.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Ling-2.6-Flash Tools Quality watermark: — | $0.01 | $0.03 | |
| 2 | Llama 3 8B Instruct Quality watermark: MMLU 76.9% | $0.02 | $0.04 | |
| 3 | Llama 3.1 8B Instruct Quality watermark: — | $0.02 | $0.05 | |
| 4 | Mistral NeMo Instruct (2407) Quality watermark: MMLU 81.5% | $0.02 | $0.04 | |
| 5 | Aleph Alpha Luminous Base Quality watermark: — | $0.02 | $0.06 | |
| 6 | Gemma 3n 4B (free) Quality watermark: — | $0.02 | $0.04 | |
| 7 | Together AI - Gemma 3n-e4B Tools Quality watermark: — | $0.02 | $0.04 | |
| 8 | Llama 3.2 1B Instruct Quality watermark: MMLU 49.3% | $0.03 | $0.10 | |
| 9 | Qwen2.5-7B-Instruct Quality watermark: MMLU 81.2% | $0.03 | $0.03 | |
| 10 | Llama 3.2 3B Instruct Quality watermark: — | $0.03 | $0.05 | |
| 11 | Granite 3.3 8B Instruct Tools Quality watermark: — | $0.03 | $0.25 | |
| 12 | LFM2-24B-A2B Tools Quality watermark: — | $0.03 | $0.12 | |
| 13 | ERNIE Lite Pro Quality watermark: — | $0.03 | $0.06 | |
| 14 | KAT Coder Pro V1 Tools Quality watermark: — | $0.03 | $1.20 | |
| 15 | Nova Micro Quality watermark: — | $0.04 | $0.14 | |
| 16 | Gemini 1.5 Flash on Google Vertex AI Vision Quality watermark: — | $0.04 | $0.10 | |
| 17 | Qwen3-8B Quality watermark: GPQA Diamond 58.9% | $0.04 | $0.14 | |
| 18 | Amazon Nova Micro Quality watermark: — | $0.04 | $0.14 | |
| 19 | AutoGLM Phone 9B Multilingual VisionTools Quality watermark: — | $0.04 | $0.14 | |
| 20 | Gemini 1.5 Flash 8B Quality watermark: — | $0.04 | $0.15 |
Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.
Google: Gemma 3n 4B (free) available via OpenRouter. Pricing: $null/1M input, $null/1M output.
$0.020
Input $/1M
Efficient 4B parameter model from Google, available on Together AI. Gemma 3 nano-edge model optimized for low-latency inference.
$0.020
Input $/1M
Llama 3.2 1B Instruct is Meta's Llama 3.2 model. It offers a 128K-token context window with weights openly available for self-hosting and scores 25.6 on GPQA.
$0.027
Input $/1M
Side-by-side comparison of the top picks by price, benchmark, and API access.
Ling-2.6-Flash is the current LLMReference top pick for low-cost API calls. The verdict uses the stored category signal Input $/1M: $0.010. Output pricing starts at $0.03 per 1M tokens. Review the linked model and provider pages before production use because availability and pricing can change.
Ling-2.6-Flash leads Mistral NeMo Instruct (2407) in the visible shortlist on Input $/1M: $0.010 versus $0.020. The pricing cards show Ling-2.6-Flash: output pricing starts at $0.03 per 1m tokens and Mistral NeMo Instruct (2407): output pricing starts at $0.04 per 1m tokens.
LLMReference ranks LLMs for low-cost API calls from stored model, benchmark, freshness, and pricing data. The current methodology summary is: Cheapest LLM APIs stay a strict price board, with a quality watermark so low-cost rows do not hide weak benchmark coverage.
The LLM rankings on this page are updated daily as new benchmark scores, provider availability, and pricing data are tracked. The "as of" date at the top of the page shows the most recent refresh.
The podium picks are driven by the primary benchmark signal for this category (shown in the Methodology section), filtered to non-deprecated models with confirmed API availability. In ties, we prefer the more recently released model.
Preview models appear in the "Watch list" section but are not in the main ranked podium unless the category explicitly allows it (e.g., /best/coding and /best/agents, where preview models often lead benchmarks).
Yes — use the Compare tool at llmreference.com/compare for a side-by-side breakdown of context window, pricing, benchmarks, and provider availability.
Pricing is tracked from provider documentation and updated regularly. It reflects the best available public data, not live API quotes — always verify before billing.