LLM ReferenceLLM Reference

Megatron

This model family is considered obsolete. Consider newer alternatives in Related Model Families below.
3 models2019

About

Megatron is a series of large language models developed by NVIDIA, recognized for their exceptional capability in handling vast quantities of parameters. These transformer-based models achieve cutting-edge results in various natural language processing tasks due to their innovative architecture, which includes tensor, pipeline, and sequence parallelism. This design allows for efficient distribution of computational workload across multiple GPUs, enabling the training of models with billions to trillions of parameters, something that would be unmanageable on a single machine. The framework's modularity permits customization for specific use cases, making it flexible for researchers and developers. Its efficiency is further enhanced by optimized fused kernels and other technical improvements, making Megatron ideal for pre-training before fine-tuning for tasks such as text generation, translation, and question answering 12.

Specifications(3 models)

Megatron model specifications comparison
ModelReleasedParametersReasoningCode Exec
Megatron GPT 20B2019-0820BNoNo
Megatron GPT 5B2019-085BYesYes
Megatron GPT 1.3B2019-081.3BNoNo

Frequently Asked Questions

What is Megatron?
Megatron is a series of large language models developed by NVIDIA, recognized for their exceptional capability in handling vast quantities of parameters. These transformer-based models achieve cutting-edge results in various natural language processing tasks due to their innovative architecture, which includes tensor, pipeline, and sequence parallelism. This design allows for efficient distribution of computational workload across multiple GPUs, enabling the training of models with billions to trillions of parameters, something that would be unmanageable on a single machine. The framework's modularity permits customization for specific use cases, making it flexible for researchers and developers. Its efficiency is further enhanced by optimized fused kernels and other technical improvements, making Megatron ideal for pre-training before fine-tuning for tasks such as text generation, translation, and question answering 12.
How many models are in the Megatron family?
The Megatron family contains 3 models.
What is the latest Megatron model?
The latest model is Megatron GPT 20B, released in 2019-08.

Models(3)