Pythia Models by EleutherAI
About
The Pythia large language model (LLM) family, crafted by EleutherAI, comprises 16 models tailored for in-depth research into the nuances of LLM behavior and training dynamics. The models range from 70 million to 12 billion parameters, all trained on the Pile dataset, with the inclusion and exclusion of deduplication, ensuring a uniform data sequence. This consistency allows for comprehensive studies on how scaling parameters affect model performance in a meticulously controlled setting. While not designed for optimal downstream tasks, the Pythia models offer performance akin to other equivalent-sized LLMs and serve primarily educational and research purposes. Publicly accessible, they provide extensive checkpoints and insights into the training process, though they remain not fine-tuned for specific applications and largely cater to English language processing.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 2k context and 12B parameters.
Use when the workload needs 2k context and 6.9B parameters.
Use when the workload needs 2k context and 2.8B parameters.
Use when the workload needs 2k context and 1.4B parameters.
Use when the workload needs 2k context and 1B parameters.
Use when the workload needs 2k context and 410M parameters.
Use when the workload needs 2k context and 160M parameters.
Use when the workload needs 2k context and 70M parameters.
Use when the workload needs 2k context and 31M parameters.
Use when the workload needs 2k context and 14M parameters.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Pythia 12B | Use when the workload needs 2k context and 12B parameters. | 2023-05 | 2k context12B parameters | Current |
| Pythia 6.9B | Use when the workload needs 2k context and 6.9B parameters. | 2023-05 | 2k context6.9B parameters | Current |
| Pythia 2.8B | Use when the workload needs 2k context and 2.8B parameters. | 2023-05 | 2k context2.8B parameters | Current |
| Pythia 1.4B | Use when the workload needs 2k context and 1.4B parameters. | 2023-05 | 2k context1.4B parameters | Current |
| Pythia 1B | Use when the workload needs 2k context and 1B parameters. | 2023-05 | 2k context1B parameters | Current |
| Pythia 410M | Use when the workload needs 2k context and 410M parameters. | 2023-05 | 2k context410M parameters | Current |
| Pythia 160M | Use when the workload needs 2k context and 160M parameters. | 2023-05 | 2k context160M parameters | Current |
| Pythia 70M | Use when the workload needs 2k context and 70M parameters. | 2023-05 | 2k context70M parameters | Current |
| Pythia 31M | Use when the workload needs 2k context and 31M parameters. | 2023-05 | 2k context31M parameters | Current |
| Pythia 14M | Use when the workload needs 2k context and 14M parameters. | 2023-05 | 2k context14M parameters | Current |
Release Timeline
1 release groupSpecifications(10 models)
| Model | Released | Context | Parameters |
|---|---|---|---|
| Pythia 12B | 2023-05 | 2k | 12B |
| Pythia 6.9B | 2023-05 | 2k | 6.9B |
| Pythia 2.8B | 2023-05 | 2k | 2.8B |
| Pythia 1.4B | 2023-05 | 2k | 1.4B |
| Pythia 1B | 2023-05 | 2k | 1B |
| Pythia 410M | 2023-05 | 2k | 410M |
| Pythia 160M | 2023-05 | 2k | 160M |
| Pythia 70M | 2023-05 | 2k | 70M |
| Pythia 31M | 2023-05 | 2k | 31M |
| Pythia 14M | 2023-05 | 2k | 14M |
Available From(1 provider)
Pricing
| Model | Provider | Input / 1M | Output / 1M | Type |
|---|---|---|---|---|
| Pythia 12B | Fireworks AI | $0.2 | $0.2 | Provisioned |
Frequently Asked Questions
- What is Pythia used for?
- Pythia is used for coding and chatbot and role-playing use cases. The family description and listed model capabilities point to those workloads as the best fit.
- How does Pythia compare to Llemma?
- Pythia by EleutherAI is strongest where you need coding, while Llemma by EleutherAI is the closest related family to check for mathematics. Pythia has 10 listed variants and reaches up to 2k context, while Llemma reaches up to 4k context, so compare the specs and pricing tables before choosing a production model.
- Which Pythia model should I use?
- For the lowest listed input price, start with Pythia 12B through Fireworks AI at $0.2/1M input tokens. For the most capable/latest local choice, evaluate Pythia 12B with 2k context.

