LLM Reference

Gopher 280B

About

Gopher 280B, developed by DeepMind, is a significant advancement in the field of natural language processing, featuring an impressive 280 billion parameters. This model surpasses OpenAI's GPT-3 in size while employing a Transformer-based architecture with modifications like RMSNorm and relative positional encoding to enhance performance on longer text sequences. Trained on a massive 10.5 TB dataset, Gopher excels in various NLP tasks such as reading comprehension and toxic language detection but faces challenges in logical reasoning tasks. Despite its capabilities, the model still exhibits limitations like repetition and bias reflection, prompting the need for improved training techniques to enhance accuracy and mitigate biases 124.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

Parameters280B
ArchitectureDecoder Only
Specializationgeneral