
SaulLM
About
The SaulLM family is a collection of large language models specifically crafted for the legal domain, with the foundational SaulLM-7B model comprising 7 billion parameters. Initially trained on an extensive English legal corpus of over 30 billion tokens, the SaulLM-7B model was further refined through advanced pretraining and instruction fine-tuning to produce SaulLM-7B-Instruct, which is optimized for instruction-following tasks within the legal sector. Following the original model's success, the family expanded with the introduction of larger models like SaulLM-54B and SaulLM-141B, incorporating 54 billion and 141 billion parameters, respectively. These models feature the Mixtral architecture and focus on refined domain adaptation, including continued pretraining on a vast legal dataset exceeding 540 billion tokens. Deployed with specialized instruction protocols and aligned with human legal interpretation preferences, all models in the SaulLM family support open collaboration through a permissive MIT license, fostering innovation within the legal AI community 12356.