Llama 3.1 405B Instruct

Name: Llama 3.1 405B Instruct
Author: AI at Meta

Released

2024-07-23

Last refreshed

2026-07-11

Status

Researched 90d ago

Open weightsCommercial use: conditionalRAGLong contextClassificationJSON / Tool use

Llama 3.1 405B Instruct is worth evaluating for rag, long context, and classification when its provider route and context window match the workload.

Use it for

Teams evaluating rag, long context, and classification
Workloads that can use a 128k context window
Buyers comparing 4 tracked provider routes

Do not use it for

Vision or document-understanding workloads

Specifications

Family: Llama 3.1
Released: 2024-07-23
Context: 128k
Parameters: 405B
Architecture: Decoder Only
Knowledge cutoff: 2023-12
Specialization: general
Openness: Open weights
License: Llama 3 CommunityCommercial use: conditional
Weights: Available
Code: Unknown
Training: Fine-tuned

Created by

AI at Meta

Large-scale open-source AI for social technologies.

Menlo Park, California, United States

Founded 2013

Website

Pricing

Output / 1M

$2.40

Input / 1M

$2.40

Cheapest of 11 routes · AWS Bedrock

Providers(11)

OctoAI API (Deprecated)Together AI Fireworks AI IBM watsonx Scale AI GenAI Platform NVIDIA NIM Microsoft Foundry Databricks Foundation Model Serving Hyperbolic AI Inference AWS Bedrock GCP Vertex AI

View 11 provider routes

About

Llama 3.1 405B Instruct is Meta's advanced large language model released on July 23, 2024, featuring 405 billion parameters. It utilizes an optimized transformer architecture with supervised fine-tuning and reinforcement learning for enhanced instruction-following capabilities. The model supports multiple languages, was trained on 15 trillion tokens, and fine-tuned with 25 million synthetic examples. It excels in multilingual dialogue and text generation, making it ideal for assistant-like applications. Llama 3.1 incorporates robust safety measures and ethical considerations, outperforming many existing models on various industry benchmarks. AI engineers can access the model via its Hugging Face page for implementation in diverse NLP tasks.

Llama 3.1 405B Instruct is an open-weight model in the Llama 3.1 family. The structured metadata tracks a 128k-token context window and structured outputs. This page tracks provider routes through OctoAI API (Deprecated), Together AI, Fireworks AI, and 8 more, with the cheapest tracked route listed at $2.4 input and $2.4 output per 1M tokens. Headline tracked benchmarks include Massive Multitask Language Understanding 88.6.

Top use-case fit

RAG

Included by capability and metadata signals in the decision map.

Long context

Included by capability and metadata signals in the decision map.

Classification

Q/$ D

1 relevant benchmark in the decision map.

Provider price ladder

Compare all 11

Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.

Provider	Input / 1M	Output / 1M	Route
AWS Bedrock	$2.40	$2.40	Serverless
Fireworks AI	$3.00	$3.00	Serverless
Hyperbolic AI Inference	$4.00	$4.00	Serverless
IBM watsonx	$3.00	$9.00	Serverless

Available via routers & gateways(16)

LiteLLM

Gateway

Open-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.

Free OSSGCP Vertex AIMicrosoft Foundry

OpenRouter

Hybrid

Unified hybrid gateway to 400+ models from 60+ providers via a single OpenAI-compatible API, with optional auto-routing that selects the best model per prompt.

PassthroughGCP Vertex AITogether AIFireworks AI

Portkey

Gateway

Production AI gateway routing to 1,600+ LLMs with failover, load balancing, semantic caching, and guardrails; Apache 2.0 core is fully self-hostable with the complete feature set.

SubscriptionGCP Vertex AIMicrosoft Foundry

AIRouter

Router

Commercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.

Passthrough + feeGCP Vertex AI

Amazon Bedrock Intelligent Prompt Routing

Router

AWS Bedrock's native intelligent prompt router that routes prompts between Anthropic Claude model tiers (Haiku/Sonnet) based on predicted task complexity, with no extra per-routing charge.

PassthroughAWS Bedrock

Azure AI Foundry Model Router

Router

Microsoft Azure AI Foundry's native model router that uses a trained ML model to route each prompt in real time to the optimal Azure-hosted model, with Balanced/Cost/Quality mode selection and automatic failover.

PassthroughMicrosoft Foundry

Capabilities

Structured Outputs

Benchmark peer barsfor Classification

Massive Multitask Language UnderstandingRank 11 of 95

Gemini 3.1 Pro Preview

98.0

GPT-5.5

92.4

Claude Opus 4.6

91.1

DeepSeek V4 Pro

90.1

Llama 3.1 405B Instructcurrent

88.6

Benchmark scores(1)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.

Benchmark	Score	Version	Evaluation	Source
Massive Multitask Language Understanding	88.6	5-shotObserved 2026-03-07	—	Source

Migration checks

No linked migration route is available for this model yet.

Compare Llama 3.1 405B Instruct with other models

Comparison and alternatives

Browse all comparisons →

Show all 19 popular comparisonssorted by 7-day search impressions

Frequently asked questions

What is the context window of Llama 3.1 405B Instruct?

Llama 3.1 405B Instruct has a context window of 128k tokens.

How much does Llama 3.1 405B Instruct cost?

Llama 3.1 405B Instruct pricing ranges from $2.40/1M to $5.33/1M input tokens depending on the provider.

When was Llama 3.1 405B Instruct released?

Llama 3.1 405B Instruct was released on 2024-07-23.

Which providers offer Llama 3.1 405B Instruct?

Llama 3.1 405B Instruct is available from 11 providers: OctoAI API (Deprecated), Together AI, Fireworks AI, IBM watsonx, Scale AI GenAI Platform, NVIDIA NIM, Microsoft Foundry, Databricks Foundation Model Serving, Hyperbolic AI Inference, AWS Bedrock, GCP Vertex AI.

What benchmarks has Llama 3.1 405B Instruct been tested on?

Llama 3.1 405B Instruct has been evaluated on 1 benchmark, including Massive Multitask Language Understanding.

Created by

AI at Meta

Large-scale open-source AI for social technologies.

Menlo Park, California, United States

Founded 2013

Website

Pricing

Output / 1M

$2.40

Input / 1M

$2.40

Cheapest of 11 routes · AWS Bedrock

Providers(11)

OctoAI API (Deprecated)Together AI Fireworks AI IBM watsonx Scale AI GenAI Platform NVIDIA NIM Microsoft Foundry Databricks Foundation Model Serving Hyperbolic AI Inference AWS Bedrock GCP Vertex AI

View 11 provider routes