BERT Models by Google DeepMind
About
BERT, short for Bidirectional Encoder Representations from Transformers, is a prominent family of large language models (LLMs) originally introduced by Google AI in 2018 1)3. These models utilize the transformer architecture to process text in a unique bidirectional manner, enabling an understanding of context by considering both preceding and following words within a sentence 8. Techniques such as masked language modeling (MLM) and next sentence prediction (NSP) contribute to BERT's superior performance on various natural language processing (NLP) tasks compared to older models 10. Initially, BERT was released in two configurations, BERTBASE with 110 million parameters and BERTLARGE with 340 million parameters, both trained on extensive datasets like the BookCorpus and English Wikipedia 3. The BERT family has since expanded to include multilingual versions and smaller models like DistilBERT and TinyBERT, catering to specific tasks and resource constraints 4. This adaptability has made BERT integral to applications like question answering, text classification, and named entity recognition 2.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 512 context and 340M parameters.
Use when the workload needs 512 context and 110M parameters.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| BERT Large | Use when the workload needs 512 context and 340M parameters. | 2018-10 | 512 context340M parameters | Current |
| BERT Base | Use when the workload needs 512 context and 110M parameters. | 2018-10 | 512 context110M parameters | Current |
Release Timeline
1 release groupSpecifications(2 models)
| Model | Released | Context | Parameters |
|---|---|---|---|
| BERT Large | 2018-10 | 512 | 340M |
| BERT Base | 2018-10 | 512 | 110M |
Frequently Asked Questions
- What is BERT used for?
- BERT is used for coding. The family description and listed model capabilities point to those workloads as the best fit.
- How does BERT compare to Gemma 4?
- BERT by Google DeepMind is strongest where you need coding, while Gemma 4 by Google DeepMind is the closest related family to check for multimodal. BERT has 2 listed variants and reaches up to 512 context, while Gemma 4 reaches up to 256k context, so compare the specs and pricing tables before choosing a production model.
- Which BERT model should I use?
- If price is the main constraint, use the pricing table first because BERT does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate BERT Large with 512 context.






