About
The ALLaM family, developed by the Saudi Data and Artificial Intelligence Authority (SDAIA), comprises large language models (LLMs) tailored for Arabic Language Technologies (ALT). Designed to be proficient in both Arabic and English, these models employ an autoregressive decoder-only architecture and are pretrained on a blend of Arabic and English texts. A critical focus of their development is on language alignment and knowledge transfer, striving for state-of-the-art performance in Arabic benchmarks. SDAIA has introduced several models within this family, including 7B, 13B, and 70B parameter models, some of which are built from scratch, while others extend training from models like Llama-2. These models are accessible via IBM's Watsonx platform under a royalty-free license, supporting both commercial and governmental applications. Significant data collection and curation efforts have resulted in one of the largest global Arabic datasets.
