LLM ReferenceLLM Reference

Best Small Language Models Under 10B Parameters (2026)

Last refreshed 2026-05-16. Next refresh: weekly.

Efficient small language models for edge deployment, cost-sensitive workloads, or on-device inference. Under 10B parameters with strong benchmark scores.

Top three picks

Opinionated short stack for this category — scroll for the full leaderboard, pricing, and compare links.

How we rank

Small models (≤10B active parameters) rank on MMLU-Pro, then GPQA Diamond, MMLU, and HellaSwag.

  1. EligibilityNon-deprecated models with ≤10B parameters (billions-only parser).
  2. Primary rankingMMLU-Pro, then GPQA Diamond, then MMLU, then HellaSwag, then newer release.
  3. Podium freshnessShortlist cards require `lastResearched` within 60 days and a tracked public output price. Stale or unpriced SKUs stay in the table with a “Verify pricing” badge once research is past 45 days.
  4. Variant collapseWe keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell.
  5. PricingSLMs often win on unit economics — compare the provider ladder before picking.
#ModelInput $/1MOutput $/1M
1Phi-4 Mini

MMLU-Pro: 52.8%

$0.05$0.15
2Gemma 2 9B

MMLU-Pro: 52.08%

$0.06$0.18
3Phi-3 Mini 4k

MMLU-Pro: 45.66%

$0.05$0.25
4Llama 3.1 8B Instruct

MMLU-Pro: 44.25%

$0.02$0.05
5Llama 3 8B Instruct

MMLU-Pro: 40.5%

$0.03$0.04
6Llama 3.2 3B Instruct

MMLU-Pro: 34.7%

$0.05$0.10
7Llama 3.2 1B Instruct

MMLU-Pro: 20%

$0.03$0.10
8MiniMax M2.7
ReasoningTools

MMLU-Pro:

$0.30$1.20
9Qwen3-8B

MMLU-Pro:

$0.05$0.20
10Qwen2-7B

MMLU-Pro:

$0.05$0.15
11Gemma 7B Instruct

MMLU-Pro:

$0.05$0.07
12OpenChat 3.5 (0106)

MMLU-Pro:

$0.07$0.07
13Starling LM 7B Beta

MMLU-Pro:

14Zephyr 7B Beta

MMLU-Pro:

$0.05$0.20
15Qwen2.5-7B-Instruct

MMLU-Pro:

$0.03$0.03
16Aya 23 8B

MMLU-Pro:

17Qwen2-1.5B

MMLU-Pro:

$0.07$0.07
18Qwen1.5-7B

MMLU-Pro:

$0.05$0.20
19Phi-2

MMLU-Pro:

$0.05$0.07
20Gemma 2B

MMLU-Pro:

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

  • Llama 3.2 3B Instruct available on AWS Bedrock

    34.7%

    MMLU-Pro

  • Llama 3.2 1B Instruct available on AWS Bedrock

    20%

    MMLU-Pro

  • MiniMax M2.7 is MiniMax's self-improving frontier model, released March 18, 2026. It introduces native multi-agent collaboration, complex skill orchestration, and early recursive self-improvement capabilities. The model uses 10B active parameters, supports a 204,800-token context window, and was released alongside MiniMax-M2.7-highspeed, a 66% faster latency-optimized variant. Public provider listings price standard M2.7 at $0.30 per 1M input tokens and $1.20 per 1M output tokens.

    MMLU-Pro