Megatron GPT 5B
About
The NeMo Megatron-GPT 5B is a transformer-based language model with 5 billion trainable parameters, inspired by models like GPT-2 and GPT-3 1. Its architecture is a decoder-only transformer, designed to sequentially process input for text generation and language understanding tasks 15. Trained on "The Piles" dataset by Eleuther.AI, it leverages its substantial dataset to produce coherent and natural-sounding text while also answering questions and completing sentences 5. Despite its strengths, the model can reflect biases and toxic language from its dataset, sometimes yielding inappropriate outputs. Evaluations on benchmarks like the LM Evaluation Test Suite showcase its varying performance, scoring 0.5566 on ARC-Easy and 0.6133 on Winogrande 1, indicating both strengths and limitations across different tasks.