Pythia 2.8B

About

The Pythia 2.8B model by EleutherAI is engineered for research into large language models (LLMs) and their interpretability. With its 2.8 billion parameters, it employs a decoder-only autoregressive architecture, leveraging flash attention to boost computational efficiency. The model, featuring 32 layers and 32 attention heads, is trained on The Pile dataset, a comprehensive collection of texts. It excels in tasks like text generation and language understanding. However, it's not fine-tuned for end-user applications, and like many LLMs, it may reflect biases present in its training data.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

FamilyPythia

Released2023-05-31

Parameters2.8B

ArchitectureDecoder Only

Specializationgeneral