Mixtral 8x7B
mixtral-8x7b
Last refreshed 2026-05-16. Next refresh: weekly.
Mixtral 8x7B is worth evaluating for coding and classification when its provider route and context window match the workload.
Decision context: Coding task fit, 18 tracked provider routes, and research from 2026-04-19.
Use it for
- Teams evaluating coding and classification
- Workloads that can use a 32K context window
- Buyers comparing 4 tracked provider routes
Do not use it for
- Vision or document-understanding workloads
- Strict JSON or tool-calling flows
Cheapest output
$0.200
SiliconFlow per 1M tokens
Provider routes
18
Tracked API hosts
Quality / dollar
Grade A
Ranked by benchmark score divided by cheapest output price
Freshness
2026-04-19
Researched 29d ago
Top use-case fit
Coding
Q/$ A1 relevant benchmark in the decision map.
Classification
Q/$ B2 relevant benchmarks in the decision map.
Provider price ladder
Compare all 18| Provider | Input / 1M | Output / 1M | Route |
|---|---|---|---|
| SiliconFlow | $0.200 | $0.200 | Serverless |
| Microsoft Foundry | $0.270 | $0.270 | Provisioned |
| Lepton AI API | $0.300 | $0.300 | Serverless |
| Mistral AI Studio | $0.150 | $0.450 | Serverless |
Benchmark peer barsfor Coding
Migration checks
No linked migration route is available for this model yet.
About
Mixtral 8x7B, developed by Mistral AI, features a cutting-edge Mixture of Experts (MoE) architecture, utilizing eight experts with seven billion parameters each, yielding a total of 46.7 billion parameters. This architecture activates only two experts per token, allowing for efficient processing and a 6x faster inference rate compared to Llama 2 70B. The model excels in performance, surpassing Llama 2 70B and competing with GPT-3.5 on numerous benchmarks. It supports multiple languages and can handle context up to 32,000 tokens, enhancing understanding of lengthy text. Designed for diverse tasks, it is strong in code generation and available under a permissive Apache 2.0 license, promoting community engagement. Compatible with various optimization tools, its weights are easily deployable, with Mistral AI continuing to improve its capabilities through performance optimizations and fine-tuning efforts.
Mixtral 8x7B has a 32K-token context window.
Mixtral 8x7B input tokens at $0.15/1M, output at $0.45/1M.
Capabilities
No model capability flags are currently sourced.
Benchmark Scores(4)
| Benchmark | Score | Version | Source |
|---|---|---|---|
| Google-Proof Q&A | 54.8 | diamond | Open LLM Leaderboard |
| HellaSwag | 90.9 | 10-shot | Open LLM Leaderboard |
| HumanEval | 80.5 | pass@1 | Open LLM Leaderboard |
| Massive Multitask Language Understanding | 80.2 | 5-shot | Open LLM Leaderboard |