Llama 4 Maverick 17B Instruct FP8
Llama 4 Maverick 17B Instruct FP8 is worth evaluating for coding, rag, and agents when its provider route and context window match the workload.
Use it for
- Teams evaluating coding, rag, and agents
- Workloads that can use a 1m context window
- Buyers comparing 4 tracked provider routes
Do not use it for
- Workloads where another current model has stronger sourced task evidence
- Family
- Llama 4
- Released
- 2025-04-05
- Context
- 1m
- Parameters
- 400B (17B active)
- Architecture
- Mixture of Experts
- Knowledge cutoff
- 2024-08
- Specialization
- general
- Openness
- Open weights
- License
- Llama 4 CommunityCommercial use: conditional
- Training
- Fine-tuned
Large-scale open-source AI for social technologies.
Cheapest of 10 routes · DeepInfra
About
Meta's Llama 4 Maverick 17B with 128 experts, FP8-optimized for cost-efficient inference. Supports native Model Router integration on Microsoft Foundry.
Llama 4 Maverick 17B Instruct FP8 is an open-weight model in the Llama 4 family. The structured metadata tracks a 1m-token context window, multimodal input, and structured outputs. This page tracks provider routes through Microsoft Foundry, Together AI, OpenRouter, and 7 more, with the cheapest tracked route listed at $0.15 input and $0.6 output per 1M tokens. Headline tracked benchmarks include HumanEval 77.4, Aider Polyglot 15.6, and BigCodeBench 49.7.
Top use-case fit: coding, agents, and build tasks
Coding
Q/$ C4 relevant benchmarks in the decision map.
RAG
Included by capability and metadata signals in the decision map.
Agents
Q/$ A1 relevant benchmark in the decision map.
Provider price ladder
Compare all 10Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.
| Provider | Input / 1M | Output / 1M | Route |
|---|---|---|---|
| DeepInfra | $0.150 | $0.600 | Serverless |
| OpenRouter | $0.150 | $0.600 | Serverless |
| Novita AI | $0.270 | $0.850 | Serverless |
| Together AI | $0.270 | $0.850 | Serverless |
Available via routers & gateways(16)
AIRouter
RouterCommercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.
Amazon Bedrock Intelligent Prompt Routing
RouterAWS Bedrock's native intelligent prompt router that routes prompts between Anthropic Claude model tiers (Haiku/Sonnet) based on predicted task complexity, with no extra per-routing charge.
Azure AI Foundry Model Router
RouterMicrosoft Azure AI Foundry's native model router that uses a trained ML model to route each prompt in real time to the optimal Azure-hosted model, with Balanced/Cost/Quality mode selection and automatic failover.
Helicone
GatewayObservability-first AI gateway with routing, caching, rate limiting, and request tracing; Apache 2.0 open-source core with a managed hosted tier for logging and analytics.
Kong AI Gateway
GatewayMulti-LLM AI gateway built on Kong Gateway 3.x, adding semantic routing, load balancing, guardrails, and MCP traffic analytics as plugins over Kong's existing API management platform.
LiteLLM
GatewayOpen-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.
Capabilities
Benchmark peer barsfor Coding
Benchmark scores(10)
| Benchmark | Score | Version | Source |
|---|---|---|---|
| HumanEval | 77.4 | 2025-04 | https://ai.meta.com/blog/llama-4-multimodal-intelligence/ |
| Aider Polyglot | 15.6 | 2026-04 | https://aider.chat/docs/leaderboards |
| BigCodeBench | 49.7 | 2025-04 (Instruct Pass@1) | https://bigcode-bench.github.io/results.json |
| Chatbot Arena | 1365.0 | — | https://lmarena.ai |
| Google-Proof Q&A | 67.1 | diamond | https://artificialanalysis.ai/leaderboards/models |
| τ-bench | 68.5 | τ-bench | https://taubench.com/ |
| MMMU Pro | 59.6 | LLM-Stats aggregator | https://llm-stats.com/benchmarks/mmmu-pro |
| Massive Multi-discipline Multimodal Understanding | 73.4 | — | https://ai.meta.com/blog/llama-4-scout-maverick/ |
| MMLU PRO | 80.5 | — | https://ai.meta.com/blog/llama-4-scout-maverick/ |
| LiveCodeBench | 43.4 | — | https://ai.meta.com/blog/llama-4-scout-maverick/ |
Migration checks
No linked migration route is available for this model yet.
Rankings & picks(4)
Comparison and alternatives
Browse all comparisons →Large-scale open-source AI for social technologies.
Cheapest of 10 routes · DeepInfra