LLM ReferenceLLM Reference

Cerebras GPT 111M

About

The Cerebras GPT 111M is a transformer-based AI model from Cerebras Systems, noted for its compact architecture and efficient performance. It features 10 layers and utilizes a GPT-3 style framework with 768 hidden units and 12 attention heads, able to process sequences up to 2048 tokens. Trained on "The Pile" dataset, it adheres to the Chinchilla scaling laws for computational efficiency, offering robust text generation and few-shot learning capabilities. Its open-source nature, released under the Apache 2.0 license, promotes research and development use, with weight streaming technology facilitating training across multiple nodes for scalability.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution

Rankings

Specifications

Released2023-03-13
ArchitectureDecoder Only
Specializationgeneral
Trainingfinetuning

Created by

World's largest AI chip innovation

Sunnyvale, California, United States
Founded 2016
Website