LLM Reference

Mamba 2 370M

About

Mamba 2 370M is a cutting-edge language model that excels in processing extremely long sequences of data through the use of Structured State Space Models (SSMs). Capable of handling contexts up to 256,000 tokens without performance degradation, it significantly advances RNN-based long-context modeling. The model features efficient inference with linear computational complexity, making it faster than traditional transformer models. Despite its relatively compact 370 million parameters, it performs exceptionally well across various NLP tasks, although it faces challenges such as state collapse on overly long sequences and demands extensive training data. Its innovative architecture and state space duality further enhance its efficiency and performance.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

FamilyMamba 2
ArchitectureDecoder Only
Specializationgeneral