What is BERT used for?

BERT is used for coding. The family description and listed model capabilities point to those workloads as the best fit.

How does BERT compare to T5Gemma?

BERT by Google DeepMind is strongest where you need coding, while T5Gemma by Google DeepMind is the closest related family to check for agent workflows and tool use. BERT has 2 listed variants and reaches up to 512 context, so compare the specs and pricing tables before choosing a production model.

Which BERT model should I use?

If price is the main constraint, use the pricing table first because BERT does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate BERT Large with 512 context.

BERT Models by Google DeepMind

Google DeepMindUnknown / Unverified

This model family is considered obsolete. Consider newer alternatives in Related Model Families below.

2 models2018Up to 512 ctx

Details

ResearcherGoogle DeepMind

LicenseUnknown / Unverified

Commercial useCommercial use: unknown

Models2

Released2018

Max context512

Links

Website HuggingFace

About

BERT, short for Bidirectional Encoder Representations from Transformers, is a prominent family of large language models (LLMs) originally introduced by Google AI in 2018 1)3. These models utilize the transformer architecture to process text in a unique bidirectional manner, enabling an understanding of context by considering both preceding and following words within a sentence 8. Techniques such as masked language modeling (MLM) and next sentence prediction (NSP) contribute to BERT's superior performance on various natural language processing (NLP) tasks compared to older models 10. Initially, BERT was released in two configurations, BERTBASE with 110 million parameters and BERTLARGE with 340 million parameters, both trained on extensive datasets like the BookCorpus and English Wikipedia 3. The BERT family has since expanded to include multilingual versions and smaller models like DistilBERT and TinyBERT, catering to specific tasks and resource constraints 4. This adaptability has made BERT integral to applications like question answering, text classification, and named entity recognition 2.

Current Variants

Use-when guidance is based on each model's tracked capabilities, context window, release date, and replacement status.

2 in view

BERT LargeCurrent

Use when the workload needs 512 context and 340M parameters.

2018-10512 context340M parameters

BERT BaseCurrent

Use when the workload needs 512 context and 110M parameters.

2018-10512 context110M parameters

Current BERT variants with use-when guidance and lifecycle status
Model	Use when	Released	Signals	Status
BERT Large	Use when the workload needs 512 context and 340M parameters.	2018-10	512 context340M parameters	Current
BERT Base	Use when the workload needs 512 context and 110M parameters.	2018-10	512 context110M parameters	Current

Release Timeline

1 release group

2018-10

2 current

BERT Base

512 context110M parameters

Current

BERT Large

512 context340M parameters

Current

Specifications(2 models)

BERT model specifications comparison
Model	Released	Context	Parameters
BERT Large	2018-10	512	340M
BERT Base	2018-10	512	110M

Frequently Asked Questions

What is BERT used for?: BERT is used for coding. The family description and listed model capabilities point to those workloads as the best fit.
How does BERT compare to T5Gemma?: BERT by Google DeepMind is strongest where you need coding, while T5Gemma by Google DeepMind is the closest related family to check for agent workflows and tool use. BERT has 2 listed variants and reaches up to 512 context, so compare the specs and pricing tables before choosing a production model.
Which BERT model should I use?: If price is the main constraint, use the pricing table first because BERT does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate BERT Large with 512 context.

Models(2)

BERT Large

2018-10512340M

BERT Base

2018-10512110M