Cerebras GPT 13B
About
The Cerebras GPT 13B is a transformer-based large language model developed by Cerebras Systems, notable for its full attention mechanism across all layers, diverging from the sparse banded attention of GPT-3 models. Part of a series ranging from 111 million to 13 billion parameters, it utilizes a GPT-3 style architecture and byte pair encoding for tokenization, following Chinchilla scaling laws for compute optimality. The model was trained with Cerebras' weight streaming technology, enabling efficient data parallelism across nodes. It excels in text prediction and generation, although it is primarily meant for research in scaling laws and NLP, not for tasks like machine translation or dialogue systems. Released under the Apache 2.0 license, it supports free commercial use, with recommendations for additional safety testing before production deployment.