Mamba Models by State Spaces
About
The Mamba family of large language models (LLMs) introduces a novel approach with its unique state space model (SSM) architecture 58. Diverging from traditional transformer models, Mamba uses selective SSMs to dynamically filter and interpret input content 58. This innovative method allows Mamba to efficiently process long sequences, achieving linear scalability in sequence length during both training and inference 58. Designed for hardware efficiency, it employs a parallel algorithm, akin to FlashAttention, to maximize GPU usage 58. Mamba has demonstrated exceptional performance on complex tasks such as language modeling, often equaling or surpassing transformer models of the same size 58. The Mamba architecture presents a promising alternative for applications demanding the handling of extended sequences 58.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
Use when the workload needs 2k context and 2.8B parameters.
Use when the workload needs 2k context and 1.4B parameters.
Use when the workload needs 2k context and 790M parameters.
Use when the workload needs 2k context and 370M parameters.
Use when the workload needs 2k context and 130M parameters.
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Mamba 2.8B | Use when the workload needs 2k context and 2.8B parameters. | 2023-12 | 2k context2.8B parameters | Current |
| Mamba 1.4B | Use when the workload needs 2k context and 1.4B parameters. | 2023-12 | 2k context1.4B parameters | Current |
| Mamba 790M | Use when the workload needs 2k context and 790M parameters. | 2023-12 | 2k context790M parameters | Current |
| Mamba 370M | Use when the workload needs 2k context and 370M parameters. | 2023-12 | 2k context370M parameters | Current |
| Mamba 130M | Use when the workload needs 2k context and 130M parameters. | 2023-12 | 2k context130M parameters | Current |
Release Timeline
1 release groupSpecifications(5 models)
| Model | Released | Context | Parameters |
|---|---|---|---|
| Mamba 2.8B | 2023-12 | 2k | 2.8B |
| Mamba 1.4B | 2023-12 | 2k | 1.4B |
| Mamba 790M | 2023-12 | 2k | 790M |
| Mamba 370M | 2023-12 | 2k | 370M |
| Mamba 130M | 2023-12 | 2k | 130M |
Available From(1 provider)
Frequently Asked Questions
- What is Mamba used for?
- The Mamba family of large language models (LLMs) introduces a novel approach with its unique state space model (SSM) architecture 58.
- How does Mamba compare to Mamba 2?
- Mamba by State Spaces is strongest where you need its listed use cases, while Mamba 2 by State Spaces is the closest related family to check for structured outputs. Mamba has 5 listed variants and reaches up to 2k context, while Mamba 2 reaches up to 2k context, so compare the specs and pricing tables before choosing a production model.
- Which Mamba model should I use?
- If price is the main constraint, use the pricing table first because Mamba does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Mamba 2.8B with 2k context.

