LLM ReferenceLLM Reference

BGE M3

bge-m3

About

BGE-M3 is BAAI's flagship multilingual embedding model that simultaneously performs dense retrieval, sparse (lexical) retrieval, and multi-vector (ColBERT-style) retrieval. It covers 100+ languages with an 8,192-token context window — far longer than most embedding models — making it effective for both short queries and long documents. Built on an extended XLM-RoBERTa architecture, it achieves state-of-the-art results on the MKQA and MLDR multilingual retrieval benchmarks and is available via NVIDIA NIM.

BGE M3 has a 8K-token context window.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution

Rankings

Specifications

FamilyBGE
Released2024-01-27
Parameters568M
Context8K
Architectureencoder
Specializationembedding
LicenseMIT
Trainingpretrained

Created by

Open-source AI fostering global collaboration

Beijing, China
Founded 2018
Website