Best Small Language Models Under 10B Parameters (2026)
Efficient small language models for edge deployment, cost-sensitive workloads, or on-device inference. Under 10B parameters with strong benchmark scores.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Nemotron 3 Nano Tools | — | — | |
| 2 | Together AI - Gemma 3n-e4B Tools | $0.02 | $0.04 | |
| 3 | Granite 4.0 1B Speech | — | — | |
| 4 | Nemotron 3 8B | $0.37 | $1.1 | |
| 5 | Transcribe (03-2026) | — | — | |
| 6 | Together AI - Qwen 3.5 9B Tools | $0.1 | $0.15 | |
| 7 | Marin 8B Instruct | — | — | |
| 8 | FireMoE 3B Chat v2 | — | — | |
| 9 | Jet-Nemotron 2B | — | — | |
| 10 | Jet-Nemotron 4B | — | — | |
| 11 | Nemotron-Nano-9B-v2 | — | — | |
| 12 | Together AI - Llama 3 8B Lite Tools | $0.1 | $0.1 | |
| 13 | Sao10K L3 Lunaris 8B | — | — | |
| 14 | NV-EmbedCode 7B v1 | — | — | |
| 15 | FireQwen2.5 7B Instruct | — | — | |
| 16 | GLM-4 Code 9B | — | — | |
| 17 | MiniCPM-4 8B | — | — | |
| 18 | Llama 3.1 Nemotron Nano 4B v1.1 | — | — | |
| 19 | GLM-4 Air 4B | — | — | |
| 20 | Granite 3.3 8B Instruct Tools | $0.03 | $0.25 |