Llama 3.1 8B Instruct

Name: Llama 3.1 8B Instruct
Author: AI at Meta

Released

2024-07-23

Last refreshed

2026-07-11

Status

Researched 90d ago

Open weightsCommercial use: conditionalRAGLong contextClassificationJSON / Tool use

Llama 3.1 8B Instruct is worth evaluating for rag, long context, and classification when its provider route and context window match the workload.

Use it for

Teams evaluating rag, long context, and classification
Workloads that can use a 128k context window
Buyers comparing 4 tracked provider routes

Do not use it for

Vision or document-understanding workloads

Specifications

Family: Llama 3.1
Released: 2024-07-23
Context: 128k
Parameters: 8B
Architecture: Decoder Only
Knowledge cutoff: 2023-12
Specialization: general
Openness: Open weights
License: Llama 3 CommunityCommercial use: conditional
Weights: Available
Code: Unknown
Training: Fine-tuned

Created by

AI at Meta

Large-scale open-source AI for social technologies.

Menlo Park, California, United States

Founded 2013

Website

Pricing

Output / 1M

$0.050

Input / 1M

$0.020

Cheapest of 16 routes · DeepInfra

Providers(16)

Cloudflare Workers AI OctoAI API (Deprecated)Together AI Fireworks AI NVIDIA NIM GroqCloud Microsoft Foundry Databricks Foundation Model Serving Hyperbolic AI Inference OpenRouter IBM watsonx AWS Bedrock Replicate API Vercel AI Gateway Novita AI DeepInfra

View 16 provider routes

About

The Llama 3.1 8B Instruct model, released on July 23, 2024, is a multilingual large language model with 8 billion parameters, optimized for instruction-following tasks. It features an enhanced transformer architecture, supporting languages like English, German, French, and others. The model excels in dialogue applications, having been fine-tuned using supervised fine-tuning and reinforcement learning with human feedback. Trained on approximately 15 trillion tokens with a December 2023 data cutoff, it outperforms many existing open-source and closed chat models in various benchmarks. Ideal for commercial and research applications such as conversational agents and content generation, the model can be accessed on Hugging Face .

Llama 3.1 8B Instruct is an open-weight model in the Llama 3.1 family. The structured metadata tracks a 128k-token context window and structured outputs. This page tracks provider routes through Cloudflare Workers AI, OctoAI API (Deprecated), Together AI, and 13 more, with the cheapest tracked route listed at $0.02 input and $0.05 output per 1M tokens. Headline tracked benchmarks include BFCL 25.8 and MMLU PRO 44.3.

Top use-case fit

RAG

Included by capability and metadata signals in the decision map.

Long context

Included by capability and metadata signals in the decision map.

Classification

Q/$ A

1 relevant benchmark in the decision map.

Provider price ladder

Compare all 16

Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.

Provider	Input / 1M	Output / 1M	Route
DeepInfra	$0.020	$0.050	Serverless
Novita AI	$0.020	$0.050	Serverless
OpenRouter	$0.020	$0.050	Serverless
GroqCloud	$0.050	$0.080	Serverless

Available via routers & gateways(8)

LiteLLM

Gateway

Open-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.

Free OSSMicrosoft Foundry

OpenRouter

Hybrid

Unified hybrid gateway to 400+ models from 60+ providers via a single OpenAI-compatible API, with optional auto-routing that selects the best model per prompt.

PassthroughTogether AIFireworks AI

Portkey

Gateway

Production AI gateway routing to 1,600+ LLMs with failover, load balancing, semantic caching, and guardrails; Apache 2.0 core is fully self-hostable with the complete feature set.

SubscriptionMicrosoft Foundry

Amazon Bedrock Intelligent Prompt Routing

Router

AWS Bedrock's native intelligent prompt router that routes prompts between Anthropic Claude model tiers (Haiku/Sonnet) based on predicted task complexity, with no extra per-routing charge.

PassthroughAWS Bedrock

Azure AI Foundry Model Router

Router

Microsoft Azure AI Foundry's native model router that uses a trained ML model to route each prompt in real time to the optimal Azure-hosted model, with Balanced/Cost/Quality mode selection and automatic failover.

PassthroughMicrosoft Foundry

Helicone

Gateway

Observability-first AI gateway with routing, caching, rate limiting, and request tracing; Apache 2.0 open-source core with a managed hosted tier for logging and analytics.

SubscriptionMicrosoft Foundry

Capabilities

Structured Outputs

Benchmark peer barsfor Classification

MMLU PRORank 100 of 105

Gemini 3 Pro

91.8

Gemini 3.1 Pro Preview

91.0

Qwen3.7-Max

89.6

Claude Opus 4.6

89.1

Llama 3.1 8B Instructcurrent

44.3

Benchmark scores(2)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.

Benchmark	Score	Version	Evaluation	Source
BFCL	25.8	v4Observed 2026-04-14	—	Source
MMLU PRO	44.3	—Observed 2026-04-14	—	Source

Migration checks

No linked migration route is available for this model yet.

Rankings & picks(2)

Best Small Language Models (SLMs)Listed Cheapest LLM APIs You Can Call Right NowListed

Compare Llama 3.1 8B Instruct with other models

Llama 3.1 8B Instruct vs Trinity-Large-Preview11

Comparison and alternatives

Browse all comparisons →

Llama 3.1 8B Instruct vs Trinity-Large-Preview

Frequently asked questions

What is the context window of Llama 3.1 8B Instruct?

Llama 3.1 8B Instruct has a context window of 128k tokens.

How much does Llama 3.1 8B Instruct cost?

Llama 3.1 8B Instruct pricing ranges from $0.02/1M to $0.3/1M input tokens depending on the provider.

When was Llama 3.1 8B Instruct released?

Llama 3.1 8B Instruct was released on 2024-07-23.

Which providers offer Llama 3.1 8B Instruct?

Llama 3.1 8B Instruct is available from 16 providers: Cloudflare Workers AI, OctoAI API (Deprecated), Together AI, Fireworks AI, NVIDIA NIM, GroqCloud, Microsoft Foundry, Databricks Foundation Model Serving, Hyperbolic AI Inference, OpenRouter, IBM watsonx, AWS Bedrock, Replicate API, Vercel AI Gateway, Novita AI, DeepInfra.

What benchmarks has Llama 3.1 8B Instruct been tested on?

Llama 3.1 8B Instruct has been evaluated on 2 benchmarks, including BFCL, MMLU PRO.

Created by

AI at Meta

Large-scale open-source AI for social technologies.

Menlo Park, California, United States

Founded 2013

Website

Pricing

Output / 1M

$0.050

Input / 1M

$0.020

Cheapest of 16 routes · DeepInfra

Providers(16)

View 16 provider routes