LLM ReferenceLLM Reference

Llama 4 Maverick 17B Instruct FP8

llama-4-maverick-17b-128e-instruct-fp8

Open Source

About

Meta's Llama 4 Maverick 17B with 128 experts, FP8-optimized for cost-efficient inference. Supports native Model Router integration on Microsoft Foundry.

Llama 4 Maverick 17B Instruct FP8 has a 1M-token context window.

Llama 4 Maverick 17B Instruct FP8 input tokens at $0.15/1M, output at $0.6/1M.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch APIAudioFine-tuning

Providers(7)

Compare all →
ProviderInput (per 1M)Output (per 1M)Type
Microsoft FoundryServerlessProvisioned
Together AI$0.27$0.85Serverless
OpenRouter$0.15$0.6Serverless
Fireworks AIServerless
DeepInfra$0.15$0.60Serverless
GCP Vertex AI$0.35$1.15Serverless
NVIDIA NIMServerless

Benchmark Scores(1)

BenchmarkScoreVersionSource
τ-bench68.5τ-benchhttps://taubench.com/

Rankings

Specifications

FamilyLlama 4
Released2025-04-05
Parameters17B
Context1M
ArchitectureMixture of Experts
Specializationgeneral
Trainingfinetuned

Created by

Large-scale open-source AI for social technologies.

Menlo Park, California, United States
Founded 2013
Website