Cerebras GPT 1.3B

About

The Cerebras GPT 1.3B is a transformer-based large language model by Cerebras Systems, boasting 1.3 billion parameters within a GPT-3 style architecture that employs full attention mechanisms. It's designed for natural language processing tasks such as text generation, summarization, and question answering, supporting a sequence length of 2048 tokens. Trained on the expansive Pile dataset with 371 billion tokens, it adheres to Chinchilla scaling laws for optimal training efficiency, excelling in few-shot learning scenarios. Its training utilized the Andromeda AI supercomputer, harnessing Cerebras' weight streaming technology for efficient scaling and distributed training without the usual complexities 123.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

FamilyCerebras GPT

ArchitectureDecoder Only

Specializationgeneral