WizardLM-2 8x22B
WizardLM-2 8x22B is worth evaluating for coding, classification, and json / tool use when its provider route and context window match the workload.
Use it for
- Teams evaluating coding, classification, and json / tool use
- Buyers comparing 4 tracked provider routes
Do not use it for
- Vision or document-understanding workloads
- Family
- WizardLM-2
- Released
- 2024-01-09
- Parameters
- 8x22B
- Architecture
- Mixture of Experts
- Specialization
- general
- Training
- finetuned
Cheapest of 5 routes · Lepton AI API
About
WizardLM-2 8x22B, developed by WizardLM@Microsoft AI, is a powerful large language model (LLM) featuring 141 billion parameters and utilizing a Mixture of Experts (MoE) architecture. It excels in complex tasks such as chat, multilingual conversations, reasoning, and agent-based interactions. Trained with an AI-powered synthetic system incorporating techniques like Evol-Instruct and AI Align AI, the model surpasses many open-source alternatives. Despite its performance on various benchmarks, further research is essential to address potential biases and enhance reliability post "toxicity testing."
WizardLM-2 8x22B is an instruction-tuned mixture-of-experts model released April 16, 2024 by the WizardLM research team at Microsoft. It is built on the Mistral 8x22B MoE backbone, which has approximately 141 billion total parameters with around 39 billion active parameters per token during inference. The context window is 65,536 tokens. The model is released under the Apache 2.0 license.
The training methodology uses Evol-Instruct, a technique that generates increasingly complex synthetic instruction-response pairs through iterative AI-driven evolution, combined with an alignment framework called AI Align AI. This approach produces a model optimized for complex multi-step instruction adherence, multilingual conversations, logical reasoning, and agentic task execution. At release, WizardLM-2 8x22B claimed to outperform GPT-4 (March 2024 version) on MT-Bench while exceeding all then-existing open-source alternatives on that benchmark.
WizardLM-2 8x22B is the largest and most capable member of the WizardLM-2 family, alongside 8x7B and 7B variants. It is available through DeepInfra, OctoAI, OpenRouter, Novita AI, and LeptonAI. The model is appropriate for applications requiring detailed instruction adherence, multi-step reasoning chains, and multilingual coverage in an open-weight MoE framework.
WizardLM-2 8x22B input tokens at $0.5/1M, output at $0.5/1M.
Top use-case fit: coding, agents, and build tasks
Coding
Q/$ B1 relevant benchmark in the decision map.
Classification
Q/$ C1 relevant benchmark in the decision map.
JSON / Tool use
Included by capability and metadata signals in the decision map.
Provider price ladder
Compare all 5Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.
| Provider | Input / 1M | Output / 1M | Route |
|---|---|---|---|
| Lepton AI API | $0.500 | $0.500 | Serverless |
| Novita AI | $0.620 | $0.620 | Serverless |
| OpenRouter | $0.620 | $0.620 | Serverless |
| DeepInfra | $0.650 | $0.650 | Serverless |
Capabilities
Benchmark peer barsfor Coding
Benchmark scores(2)
| Benchmark | Score | Version | Source |
|---|---|---|---|
| HumanEval | 67.5 | pass@1 | https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard |
| Massive Multitask Language Understanding | 76.9 | 5-shot | https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard |
Migration checks
No linked migration route is available for this model yet.