Llama 2 70B Chat

Name: Llama 2 70B Chat
Author: AI at Meta

Released

2023-07-18

Last refreshed

2026-07-09

Status

Researched 90d ago

DeprecatedOpen weightsCommercial use: conditionalClassificationJSON / Tool use

Llama 2 70B Chat is a legacy integration reference; keep it only while you identify a current replacement.

Use it for

Teams maintaining an existing integration
Workloads that can use a 4k context window
Buyers comparing 4 tracked provider routes

Do not use it for

New production launches
Vision or document-understanding workloads

Specifications

Family: Llama 2
Released: 2023-07-18
Context: 4k
Parameters: 70B
Architecture: Decoder Only
Specialization: general
Openness: Open weights
License: Llama 2 CommunityCommercial use: conditional
Weights: Available
Code: Unknown
Training: Fine-tuned

Created by

AI at Meta

Large-scale open-source AI for social technologies.

Menlo Park, California, United States

Founded 2013

Website

Pricing

Output / 1M

$0.500

Input / 1M

$0.500

Cheapest of 14 routes · Lepton AI API

Providers(14)

Databricks Foundation Model Serving Microsoft Foundry GCP Vertex AI Alibaba Cloud PAI-EAS AWS Bedrock OCI Generative AI NVIDIA NIM DeepInfra Lepton AI API Together AI IBM watsonx Scale AI GenAI Platform Fireworks AI Replicate API

View 14 provider routes

About

Llama 2 70B Chat is a large-scale language model with 70 billion parameters, designed for conversational AI applications. Released on July 18, 2023, it's part of Meta's Llama 2 family, featuring advanced transformer architecture optimized through supervised fine-tuning and reinforcement learning with human feedback. The model excels in generating human-like responses, outperforming many open-source alternatives and rivaling closed-source models like ChatGPT. Trained on 2 trillion tokens from diverse public sources, it's suitable for commercial and research applications in English, particularly for assistant-like functionalities. The model is available on Hugging Face for further exploration and implementation .

Llama 2 70B Chat is an open-weight model in the Llama 2 family. The structured metadata tracks a 4k-token context window and structured outputs. This page tracks provider routes through Databricks Foundation Model Serving, Microsoft Foundry, GCP Vertex AI, and 11 more, with the cheapest tracked route listed at $0.5 input and $0.5 output per 1M tokens. Headline tracked benchmarks include Massive Multitask Language Understanding 68.9.

Top use-case fit

Classification

1 relevant benchmark in the decision map.

JSON / Tool use

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare all 14

Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.

Provider	Input / 1M	Output / 1M	Route
Lepton AI API	$0.500	$0.500	Serverless
DeepInfra	$0.640	$0.640	Serverless
Fireworks AI	$0.900	$0.900	Serverless
Together AI	$0.900	$0.900	Serverless

Available via routers & gateways(16)

LiteLLM

Gateway

Open-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.

Free OSSGCP Vertex AIMicrosoft Foundry

OpenRouter

Hybrid

Unified hybrid gateway to 400+ models from 60+ providers via a single OpenAI-compatible API, with optional auto-routing that selects the best model per prompt.

PassthroughGCP Vertex AITogether AIFireworks AI

Portkey

Gateway

Production AI gateway routing to 1,600+ LLMs with failover, load balancing, semantic caching, and guardrails; Apache 2.0 core is fully self-hostable with the complete feature set.

SubscriptionGCP Vertex AIMicrosoft Foundry

AIRouter

Router

Commercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.

Passthrough + feeGCP Vertex AI

Amazon Bedrock Intelligent Prompt Routing

Router

AWS Bedrock's native intelligent prompt router that routes prompts between Anthropic Claude model tiers (Haiku/Sonnet) based on predicted task complexity, with no extra per-routing charge.

PassthroughAWS Bedrock

Azure AI Foundry Model Router

Router

Microsoft Azure AI Foundry's native model router that uses a trained ML model to route each prompt in real time to the optimal Azure-hosted model, with Balanced/Cost/Quality mode selection and automatic failover.

PassthroughMicrosoft Foundry

Capabilities

Structured Outputs

Benchmark peer barsfor Classification

Massive Multitask Language UnderstandingRank 76 of 95

Gemini 3.1 Pro Preview

98.0

GPT-5.5

92.4

Claude Opus 4.6

91.1

DeepSeek V4 Pro

90.1

Llama 2 70B Chatcurrent

68.9

Benchmark scores(1)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.

Benchmark	Score	Version	Evaluation	Source
Massive Multitask Language Understanding	68.9	5-shotObserved 2026-03-07	—	Source

Migration checks

No linked migration route is available for this model yet.

Compare Llama 2 70B Chat with other models

Comparison and alternatives

Browse all comparisons →

Show all 8 popular comparisonssorted by 7-day search impressions

Llama 2 70B Chat vs Xiaomi MiMo-V2.5-TTS-Series4 Llama 2 70B Chat vs Together AI Qwen2-7B-Instruct3 Llama 2 70B Chat vs Sarvam 30B3 Llama 2 70B Chat vs Mistral Medium 3.53 Llama 2 70B Chat vs Mistral Large 3 675B Instruct2 Llama 2 70B Chat vs Aquila Chat 2 70B Expressive1 Llama 2 70B Chat vs Gemini 2.5 Flash Live API1 Llama 2 70B Chat vs Teuken 7B Instruct1

Frequently asked questions

What is the context window of Llama 2 70B Chat?

Llama 2 70B Chat has a context window of 4k tokens.

How much does Llama 2 70B Chat cost?

Llama 2 70B Chat pricing ranges from $0.5/1M to $1.95/1M input tokens depending on the provider.

When was Llama 2 70B Chat released?

Llama 2 70B Chat was released on 2023-07-18.

Which providers offer Llama 2 70B Chat?

Llama 2 70B Chat is available from 14 providers: Databricks Foundation Model Serving, Microsoft Foundry, GCP Vertex AI, Alibaba Cloud PAI-EAS, AWS Bedrock, OCI Generative AI, NVIDIA NIM, DeepInfra, Lepton AI API, Together AI, IBM watsonx, Scale AI GenAI Platform, Fireworks AI, Replicate API.

What benchmarks has Llama 2 70B Chat been tested on?

Llama 2 70B Chat has been evaluated on 1 benchmark, including Massive Multitask Language Understanding.

Created by

AI at Meta

Large-scale open-source AI for social technologies.

Menlo Park, California, United States

Founded 2013

Website

Pricing

Output / 1M

$0.500

Input / 1M

$0.500

Cheapest of 14 routes · Lepton AI API

Providers(14)

View 14 provider routes