LLM ReferenceLLM Reference

Megatron GPT 20B

About

Megatron-GPT 20B is a transformer-based, decoder-only language model akin to GPT-2 and GPT-3. It features 20 billion trainable parameters, showcasing its extensive capacity for nuanced text generation tasks. Developed using the NeMo Megatron framework, this model excels in processing lengthy text sequences by capturing intricate contextual relationships. Its architecture, based on "The Pile" dataset from EleutherAI, supports a wide range of natural language processing tasks, though it does carry risks of biased outputs due to its internet-based training data. Despite significant computational demands, Megatron-GPT 20B offers a profound exploration into advancing large language model capabilities.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution

Rankings

Specifications

FamilyMegatron
Released2019-08-28
Parameters20B
ArchitectureDecoder Only
Specializationgeneral
Trainingfinetuning

Created by

Accelerated AI for enterprise solutions

Santa Clara, California, United States
Founded 2015
Website