LLM Reference

CodeLlama 70B Python

Released
2024-01-29
Last refreshed
2026-06-01
Status
Researched 12d ago
Open WeightsCommercial use with conditionsClassificationJSON / Tool use

CodeLlama 70B Python is worth evaluating for classification and json / tool use when its provider route and context window match the workload.

Use it for

  • Teams evaluating classification and json / tool use
  • Workloads that can use a 16k context window
  • Buyers comparing 4 tracked provider routes

Do not use it for

  • Vision or document-understanding workloads
Specifications
Released
2024-01-29
Context
16k
Parameters
70B
Architecture
Decoder Only
Knowledge cutoff
2022-09
Specialization
general
Openness
Open weights
License
Llama 2 CommunityCommercial use with conditions
Training
finetuned
Created by

Large-scale open-source AI for social technologies.

Menlo Park, California, United States
Founded 2013
Website
Pricing
Output / 1M
$0.900
Input / 1M
$0.900

Cheapest of 4 routes · Fireworks AI

About

CodeLlama 70B Python is a specialized AI model by Meta, designed for Python code synthesis and understanding. With 70 billion parameters, it excels in code completion, infilling, and instruction following tasks. The model leverages an optimized transformer architecture and has been fine-tuned with up to 16,000 tokens, making it particularly effective for Python-centric development workflows. While it doesn't support long contexts of 100,000 tokens, it offers powerful capabilities for both commercial and research applications in Python programming environments. More details can be found in the research paper "Code Llama: Open Foundation Models for Code" .

CodeLlama 70B Python is Meta's highest-capacity Python-specialized code model, released as part of the CodeLlama 70B family in early 2024. It has 70 billion parameters and is produced by fine-tuning CodeLlama 70B on Python-specific training data, maximizing Python code generation accuracy. The context window is 16,384 tokens. Like other Python-specific CodeLlama variants, it is optimized for completion and fill-in-the-middle (FIM) infilling rather than instruction following—it operates on code completion prompts and FIM tokens rather than conversational prompts.

At 70 billion parameters, this model represents the highest CodeLlama scale for Python tasks. It outperforms the 7B and 13B Python variants on complex Python generation tasks: larger algorithm implementations, accurate inference of library APIs, and multi-function code blocks. The large parameter count enables better pattern recognition across diverse Python codebases and library usage styles. Serving in FP16 requires approximately 140GB VRAM, typically requiring 2–4 high-memory GPUs.

CodeLlama 70B Python is available as open weights on Hugging Face (meta-llama/CodeLlama-70b-Python-hf) under Meta's Code Llama Community License, and is hosted on Together AI, Fireworks AI, Azure AI Foundry, and Replicate. For instruction-following use with Python code generation, the CodeLlama 70B Instruct variant is more appropriate. For new Python-specialized deployments, Qwen2.5-Coder or Qwen3-Coder models offer better benchmark results with longer context windows.

CodeLlama 70B Python has a 16k-token context window.

CodeLlama 70B Python input tokens at $0.65/1M, output at $2.75/1M.

Top use-case fit

Classification

Included by capability and metadata signals in the decision map.

JSON / Tool use

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare all 4

Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.

ProviderInput / 1MOutput / 1MRoute
Fireworks AI$0.900$0.900
Provisioned
Together AI$0.900$0.900
Serverless
Replicate API$0.650$2.75
Serverless
Microsoft Foundry$3.78$11.34
Provisioned

Available via routers & gateways(6)

Capabilities

Structured Outputs

Benchmark peer barsfor Classification

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.