Cerebras GPT Models by Cerebras
About
The Cerebras GPT family includes seven open-source large language models, ranging from 111 million to 13 billion parameters. These models were developed by Cerebras Systems using the Chinchilla formula, optimizing 20 tokens per parameter to achieve high accuracy within a defined compute budget. Available on Hugging Face under the Apache 2.0 license, these models are accessible for both research and commercial use. Training took place on the Andromeda AI supercomputer, leveraging Cerebras' weight streaming technology for efficient computation across multiple nodes. This setup enhances training speed, reduces costs, and minimizes energy consumption, making them notably efficient compared to other models available 12.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 2k context and 13B parameters.
Use when the workload needs 2k context and 7B parameters.
Use when the workload needs 2k context and 2.7B parameters.
Use when the workload needs 2k context and 1.3B parameters.
Use when the workload needs 2k context, 590M parameters, and reasoning.
Use when the workload needs 2k context and 256M parameters.
Use when the workload needs 2k context and 111M parameters.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Cerebras GPT 13B | Use when the workload needs 2k context and 13B parameters. | 2023-03 | 2k context13B parameters | Current |
| Cerebras GPT 7B | Use when the workload needs 2k context and 7B parameters. | 2023-03 | 2k context7B parameters | Current |
| Cerebras GPT 2.7B | Use when the workload needs 2k context and 2.7B parameters. | 2023-03 | 2k context2.7B parameters | Current |
| Cerebras GPT 1.3B | Use when the workload needs 2k context and 1.3B parameters. | 2023-03 | 2k context1.3B parameters | Current |
| Cerebras GPT 590M | Use when the workload needs 2k context, 590M parameters, and reasoning. | 2023-03 | 2k context590M parametersreasoning | Current |
| Cerebras GPT 256M | Use when the workload needs 2k context and 256M parameters. | 2023-03 | 2k context256M parameters | Current |
| Cerebras GPT 111M | Use when the workload needs 2k context and 111M parameters. | 2023-03 | 2k context111M parameters | Current |
Release Timeline
1 release groupSpecifications(7 models)
| Model | Released | Context | Parameters | Reasoning | Code Exec |
|---|---|---|---|---|---|
| Cerebras GPT 13B | 2023-03 | 2k | 13B | No | No |
| Cerebras GPT 7B | 2023-03 | 2k | 7B | No | No |
| Cerebras GPT 2.7B | 2023-03 | 2k | 2.7B | No | No |
| Cerebras GPT 1.3B | 2023-03 | 2k | 1.3B | No | No |
| Cerebras GPT 590M | 2023-03 | 2k | 590M | Yes | Yes |
| Cerebras GPT 256M | 2023-03 | 2k | 256M | No | No |
| Cerebras GPT 111M | 2023-03 | 2k | 111M | No | No |
Frequently Asked Questions
- What is Cerebras GPT used for?
- Cerebras GPT is used for reasoning, code execution, and coding. The family description and listed model capabilities point to those workloads as the best fit.
- How does Cerebras GPT compare to Cerebras LLaVA?
- Cerebras GPT by Cerebras is strongest where you need reasoning, while Cerebras LLaVA by Cerebras is the closest related family to check for coding. Cerebras GPT has 7 listed variants and reaches up to 2k context, while Cerebras LLaVA reaches up to 4k context, so compare the specs and pricing tables before choosing a production model.
- Which Cerebras GPT model should I use?
- If price is the main constraint, use the pricing table first because Cerebras GPT does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Cerebras GPT 590M with 2k context and reasoning.

