LLM ReferenceLLM Reference
This model family is considered obsolete. Consider newer alternatives in Related Model Families below.
2 models2018

About

BERT, short for Bidirectional Encoder Representations from Transformers, is a prominent family of large language models (LLMs) originally introduced by Google AI in 2018 1)3. These models utilize the transformer architecture to process text in a unique bidirectional manner, enabling an understanding of context by considering both preceding and following words within a sentence 8. Techniques such as masked language modeling (MLM) and next sentence prediction (NSP) contribute to BERT's superior performance on various natural language processing (NLP) tasks compared to older models 10. Initially, BERT was released in two configurations, BERTBASE with 110 million parameters and BERTLARGE with 340 million parameters, both trained on extensive datasets like the BookCorpus and English Wikipedia 3. The BERT family has since expanded to include multilingual versions and smaller models like DistilBERT and TinyBERT, catering to specific tasks and resource constraints 4. This adaptability has made BERT integral to applications like question answering, text classification, and named entity recognition 2.

Specifications(2 models)

BERT model specifications comparison
ModelReleasedParameters
BERT Large2018-10340M
BERT Base2018-10110M

Frequently Asked Questions

What is BERT?
BERT, short for Bidirectional Encoder Representations from Transformers, is a prominent family of large language models (LLMs) originally introduced by Google AI in 2018 1)3. These models utilize the transformer architecture to process text in a unique bidirectional manner, enabling an understanding of context by considering both preceding and following words within a sentence 8. Techniques such as masked language modeling (MLM) and next sentence prediction (NSP) contribute to BERT's superior performance on various natural language processing (NLP) tasks compared to older models 10. Initially, BERT was released in two configurations, BERTBASE with 110 million parameters and BERTLARGE with 340 million parameters, both trained on extensive datasets like the BookCorpus and English Wikipedia 3. The BERT family has since expanded to include multilingual versions and smaller models like DistilBERT and TinyBERT, catering to specific tasks and resource constraints 4. This adaptability has made BERT integral to applications like question answering, text classification, and named entity recognition 2.
How many models are in the BERT family?
The BERT family contains 2 models.
What is the latest BERT model?
The latest model is BERT Large, released in 2018-10.

Models(2)