LLM Reference

Mamba 2 780M

About

Mamba-2 780M is a cutting-edge large language model that enhances the capabilities of its predecessor with an innovative Structured State Space Model (SSM) architecture. This design enables efficient handling of long sequences while maintaining competitive performance against traditional transformers. It utilizes the Structured State Space Duality (SSD) technique to blend benefits of SSMs and attention mechanisms, allowing linear scaling and optimal memory use during inference. Mamba-2 780M is adept at tasks requiring processing of extensive data and excels in language and multimodal applications. Trained on 300 billion tokens from large datasets like the Pile and SlimPajama, it demonstrates formidable zero-shot performance. It's optimized for speed and hardware efficiency, though it might face challenges with short contexts and complexity in implementation.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

FamilyMamba 2
ArchitectureDecoder Only
Specializationgeneral