Gemma 2 9B
- MMLU-Pro
- 52.08%
- Output (from)
- $0.180 / 1M
Last refreshed 2026-05-16. Next refresh: weekly.
Efficient small language models for edge deployment, cost-sensitive workloads, or on-device inference. Under 10B parameters with strong benchmark scores.
Opinionated short stack for this category — scroll for the full leaderboard, pricing, and compare links.
Small models (≤10B active parameters) rank on MMLU-Pro, then GPQA Diamond, MMLU, and HellaSwag.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Phi-4 Mini MMLU-Pro: 52.8% | $0.05 | $0.15 | |
| 2 | Gemma 2 9B MMLU-Pro: 52.08% | $0.06 | $0.18 | |
| 3 | Phi-3 Mini 4k MMLU-Pro: 45.66% | $0.05 | $0.25 | |
| 4 | Llama 3.1 8B Instruct MMLU-Pro: 44.25% | $0.02 | $0.05 | |
| 5 | Llama 3 8B Instruct MMLU-Pro: 40.5% | $0.03 | $0.04 | |
| 6 | Llama 3.2 3B Instruct MMLU-Pro: 34.7% | $0.05 | $0.10 | |
| 7 | Llama 3.2 1B Instruct MMLU-Pro: 20% | $0.03 | $0.10 | |
| 8 | MiniMax M2.7 ReasoningTools MMLU-Pro: — | $0.30 | $1.20 | |
| 9 | Qwen3-8B MMLU-Pro: — | $0.05 | $0.20 | |
| 10 | Qwen2-7B MMLU-Pro: — | $0.05 | $0.15 | |
| 11 | Gemma 7B Instruct MMLU-Pro: — | $0.05 | $0.07 | |
| 12 | OpenChat 3.5 (0106) MMLU-Pro: — | $0.07 | $0.07 | |
| 13 | Starling LM 7B Beta MMLU-Pro: — | — | — | |
| 14 | Zephyr 7B Beta MMLU-Pro: — | $0.05 | $0.20 | |
| 15 | Qwen2.5-7B-Instruct MMLU-Pro: — | $0.03 | $0.03 | |
| 16 | Aya 23 8B MMLU-Pro: — | — | — | |
| 17 | Qwen2-1.5B MMLU-Pro: — | $0.07 | $0.07 | |
| 18 | Qwen1.5-7B MMLU-Pro: — | $0.05 | $0.20 | |
| 19 | Phi-2 MMLU-Pro: — | $0.05 | $0.07 | |
| 20 | Gemma 2B MMLU-Pro: — | — | — |
Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.
Llama 3.2 3B Instruct available on AWS Bedrock
34.7%
MMLU-Pro
Llama 3.2 1B Instruct available on AWS Bedrock
20%
MMLU-Pro
MiniMax M2.7 is MiniMax's self-improving frontier model, released March 18, 2026. It introduces native multi-agent collaboration, complex skill orchestration, and early recursive self-improvement capabilities. The model uses 10B active parameters, supports a 204,800-token context window, and was released alongside MiniMax-M2.7-highspeed, a 66% faster latency-optimized variant. Public provider listings price standard M2.7 at $0.30 per 1M input tokens and $1.20 per 1M output tokens.
—
MMLU-Pro