Mamba 1.4B
About
Mamba 1.4B is a cutting-edge large language model that utilizes a state-space model (SSM) architecture, designed for efficient processing of extended sequences. Unlike traditional Transformer models, Mamba scales linearly with sequence length, capable of handling up to a million elements, thanks to its selective SSM layer that optimally filters token information. This innovative approach enhances inference speed by forgoing the conventional attention mechanism, enabling a 5x throughput improvement over similarly sized Transformers. Optimized for NVIDIA GPUs, the model offers performance on par with larger models, although it may fall short in some downstream tasks compared to more extensive, fine-tuned solutions. Trained on the Pile dataset, its training specifics vary among sources, reflecting a need for clearer reporting standards.
Capabilities
Providers(1)
| Provider | Input (per 1M) | Output (per 1M) | Type | |
|---|---|---|---|---|
| Replicate API | — | — | Serverless |