LLM Reference

Cerebras GPT 111M

About

The Cerebras GPT 111M is a transformer-based AI model from Cerebras Systems, noted for its compact architecture and efficient performance. It features 10 layers and utilizes a GPT-3 style framework with 768 hidden units and 12 attention heads, able to process sequences up to 2048 tokens. Trained on "The Pile" dataset, it adheres to the Chinchilla scaling laws for computational efficiency, offering robust text generation and few-shot learning capabilities. Its open-source nature, released under the Apache 2.0 license, promotes research and development use, with weight streaming technology facilitating training across multiple nodes for scalability.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

ArchitectureDecoder Only
Specializationgeneral