LLM Reference

Mamba 1.4B

About

Mamba 1.4B is a cutting-edge large language model that utilizes a state-space model (SSM) architecture, designed for efficient processing of extended sequences. Unlike traditional Transformer models, Mamba scales linearly with sequence length, capable of handling up to a million elements, thanks to its selective SSM layer that optimally filters token information. This innovative approach enhances inference speed by forgoing the conventional attention mechanism, enabling a 5x throughput improvement over similarly sized Transformers. Optimized for NVIDIA GPUs, the model offers performance on par with larger models, although it may fall short in some downstream tasks compared to more extensive, fine-tuned solutions. Trained on the Pile dataset, its training specifics vary among sources, reflecting a need for clearer reporting standards.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Providers(1)

ProviderInput (per 1M)Output (per 1M)Type
Replicate API
Serverless

Specifications

FamilyMamba
ArchitectureDecoder Only
Specializationgeneral