Jamba v0.1
About
Jamba v0.1, developed by AI21 Labs, is a large language model known for its hybrid architecture combining Transformer and Mamba layers. This structure allows it to efficiently manage a vast number of parameters—52 billion in total, with 12 billion active—leveraging mixture-of-experts (MoE) to optimize text processing. Capable of handling a context length of 256K tokens, Jamba v0.1 outperforms many other models in processing extensive inputs. Although computationally demanding, its hybrid design enables throughput gains. However, it lacks instruction tuning and safety moderation, potentially generating inappropriate outputs, and requires fine-tuning for specific tasks. Despite challenges, its ability to fit 140K tokens on a single 80GB GPU and process large datasets makes it remarkable 13.