Llama 3.3 Models by AI at Meta
3 models2024–2025Up to 128K ctxFrom $0.1/1M input
About
Llama 3.3 is a family of 3 AI models by AI at Meta, released between 2024 and 2025.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
3 in view
Llama 3.3 70BCurrent
Use when the workload needs 8K context, 70B parameters, and tool use.
2025-128K context70B parameterstool use
Llama 3.3 70B InstructCurrent
Use when the workload needs 128K context, 70B parameters, and structured outputs.
2025-09128K context70B parametersstructured outputs
Use when the workload needs 66K context, 70B parameters, and structured outputs.
2024-1266K context70B parametersstructured outputs
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Llama 3.3 70B | Use when the workload needs 8K context, 70B parameters, and tool use. | 2025-12 | 8K context70B parameterstool use | Current |
| Llama 3.3 70B Instruct | Use when the workload needs 128K context, 70B parameters, and structured outputs. | 2025-09 | 128K context70B parametersstructured outputs | Current |
| Llama 3.3 70B Instruct (free) | Use when the workload needs 66K context, 70B parameters, and structured outputs. | 2024-12 | 66K context70B parametersstructured outputs | Current |
Release Timeline
3 release groups2025-12
1 current
Llama 3.3 70B
Current8K context70B parameterstool use
2025-09
1 current
Llama 3.3 70B Instruct
Current128K context70B parametersstructured outputs
2024-12
1 current
Llama 3.3 70B Instruct (free)
Current66K context70B parametersstructured outputs
Specifications(3 models)
| Model | Released | Context | Parameters | Vision | Multimodal | Fn Calling | Tool Use | Structured Outputs |
|---|---|---|---|---|---|---|---|---|
| Llama 3.3 70B | 2025-12 | 8K | 70B | Yes | Yes | Yes | Yes | No |
| Llama 3.3 70B Instruct | 2025-09 | 128k | 70B | No | No | No | No | Yes |
| Llama 3.3 70B Instruct (free) | 2024-12 | 66K | 70B | No | No | No | No | Yes |
Available From(11 providers)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| Llama 3.3 70B Instruct (free) | OpenRouter | $0.1 | $0.32 | Serverless |
| Llama 3.3 70B Instruct (free) | Novita AI | $0.135 | $0.4 | Serverless |
| Llama 3.3 70B Instruct (free) | Chutes AI | $0.22 | $0.66 | Serverless |
| Llama 3.3 70B Instruct (free) | Together AI | $0.44 | $0.44 | Serverless |
| Llama 3.3 70B Instruct (free) | GroqCloud | $0.59 | $0.79 | Serverless |
| Llama 3.3 70B Instruct (free) | Arcee AI | $0.6 | $1.8 | Serverless |
| Llama 3.3 70B Instruct (free) | Microsoft Foundry | $0.71 | $0.71 | Serverless |
| Llama 3.3 70B Instruct (free) | AWS Bedrock | $0.72 | $0.72 | Serverless |
| Llama 3.3 70B Instruct (free) | Vercel AI Gateway | $0.72 | $0.72 | Serverless |
| Llama 3.3 70B | Fireworks AI | $0.9 | $0.9 | Serverless |
| Llama 3.3 70B Instruct | AWS Bedrock | $0.96 | $1.28 | Serverless |
Comparisons
- GPT-4o (08-06) vs Llama 3.3 70B
- Claude 3.5 Sonnet vs Llama 3.3 70B
- Gemini 2.5 Pro vs Llama 3.3 70B
- DeepSeek R1 vs Llama 3.3 70B
- Llama 3.3 70B vs Grok-2
- Qwen2.5-72B-Instruct vs Llama 3.3 70B
- Qwen3-30B-A3B vs Llama 3.3 70B
- Llama 4 Maverick 17B Instruct FP8 vs Llama 3.3 70B
Frequently Asked Questions
- What is Llama 3.3 used for?
- Llama 3.3 is used for vision and multimodal work, agent workflows and tool use, and structured outputs. The family description and listed model capabilities point to those workloads as the best fit.
- How does Llama 3.3 compare to Chameleon?
- Llama 3.3 by AI at Meta is strongest where you need vision and multimodal work, while Chameleon by AI at Meta is the closest related family to check for coding. Llama 3.3 has 3 listed variants and reaches up to 128K context, while Chameleon reaches up to 4K context, so compare the specs and pricing tables before choosing a production model.
- Which Llama 3.3 model should I use?
- For the lowest listed input price, start with Llama 3.3 70B Instruct (free) through OpenRouter at $0.1/1M input tokens. For the most capable/latest local choice, evaluate Llama 3.3 70B with 8K context and tool use, function calling, and multimodal inputs.






