LLM Reference

Mamba 370M

About

Mamba 370M is a 370-million parameter large language model leveraging a state-space model (SSM) architecture, which differentiates it from traditional transformer models by eschewing attention and MLP blocks in favor of linear scaling with sequence length [6][9]. This design ensures efficient processing of lengthy sequences and is optimized for parallel GPU processing [6]. Notable for its text generation capabilities, Mamba 370M is also utilized for Japanese language processing [10], though the details of its training data vary, with some mentioning the Pile dataset [1]. A known limitation, "state collapse," wherein performance declines with longer sequences, has been addressed with mitigation techniques [7]. Despite these challenges, certain studies have shown Mamba models can handle sequences up to 256K tokens accurately with the right training [7].

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Providers(1)

ProviderInput (per 1M)Output (per 1M)Type
Replicate API
Serverless

Specifications

FamilyMamba
ArchitectureDecoder Only
Specializationgeneral