Llama 3.1 8B Instruct
- Input $/1M
- $0.020
- Output (from)
- $0.050 / 1M
Last refreshed 2026-05-18. Next refresh: weekly.
Cheapest LLM APIs you can call right now, ranked by strict lowest tracked input price with an MMLU or GPQA quality watermark beside each row.
Use this page when token price is the first constraint and quality still matters. The rows below exclude zero-dollar tiers and surface a quality watermark beside tracked input prices.
Need no-cost options? Compare the free-model leaderboard separately from paid API pricing.
Opinionated short stack for this category — scroll for the full leaderboard, pricing, and compare links.
Cheapest LLM APIs stay a strict price board, with a quality watermark so low-cost rows do not hide weak benchmark coverage.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Llama 3.1 8B Instruct Quality watermark: — | $0.02 | $0.05 | |
| 2 | Mistral NeMo Instruct (2407) Quality watermark: MMLU 81.5% | $0.02 | $0.04 | |
| 3 | Aleph Alpha Luminous Base Quality watermark: — | $0.02 | $0.06 | |
| 4 | Gemma 3n 4B (free) Quality watermark: — | $0.02 | $0.04 | |
| 5 | Together AI - Gemma 3n-e4B Tools Quality watermark: — | $0.02 | $0.04 | |
| 6 | Llama 3.2 1B Instruct Quality watermark: MMLU 49.3% | $0.03 | $0.10 | |
| 7 | gpt-oss-20b Tools Quality watermark: GPQA Diamond 68.8% | $0.03 | $0.14 | |
| 8 | Llama 3 8B Instruct Quality watermark: MMLU 76.9% | $0.03 | $0.04 | |
| 9 | Qwen2.5-7B-Instruct Quality watermark: MMLU 81.2% | $0.03 | $0.03 | |
| 10 | Granite 3.3 8B Instruct Tools Quality watermark: — | $0.03 | $0.25 | |
| 11 | LFM2-24B-A2B Tools Quality watermark: — | $0.03 | $0.12 | |
| 12 | ERNIE Lite Pro Quality watermark: — | $0.03 | $0.06 | |
| 13 | Nova Micro Quality watermark: — | $0.04 | $0.14 | |
| 14 | Gemini 1.5 Flash on Google Vertex AI Vision Quality watermark: — | $0.04 | $0.10 | |
| 15 | Gemini 1.5 Flash 8B Quality watermark: — | $0.04 | $0.15 | |
| 16 | gpt-oss-120b Tools Quality watermark: GPQA Diamond 78.2% | $0.04 | $0.18 | |
| 17 | Aleph Alpha Luminous Extended Quality watermark: — | $0.04 | $0.12 | |
| 18 | Qwen3-9B Quality watermark: — | $0.04 | $0.20 | |
| 19 | Nemotron-Nano-9B-v2 Quality watermark: — | $0.04 | $0.16 | |
| 20 | ERNIE Speed Pro Quality watermark: — | $0.04 | $0.09 |
Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.
Efficient 4B parameter model from Google, available on Together AI. Gemma 3 nano-edge model optimized for low-latency inference.
$0.020
Input $/1M
Llama 3.2 1B Instruct available on AWS Bedrock
$0.027
Input $/1M
OpenAI open-weight model with 20 billion parameters. Lightweight, efficient text-only model with reasoning and function calling. Free for self-hosting. Released August 5, 2025.
$0.030
Input $/1M