SaulLM
About
The SaulLM family is a collection of large language models specifically crafted for the legal domain, with the foundational SaulLM-7B model comprising 7 billion parameters. Initially trained on an extensive English legal corpus of over 30 billion tokens, the SaulLM-7B model was further refined through advanced pretraining and instruction fine-tuning to produce SaulLM-7B-Instruct, which is optimized for instruction-following tasks within the legal sector. Following the original model's success, the family expanded with the introduction of larger models like SaulLM-54B and SaulLM-141B, incorporating 54 billion and 141 billion parameters, respectively. These models feature the Mixtral architecture and focus on refined domain adaptation, including continued pretraining on a vast legal dataset exceeding 540 billion tokens. Deployed with specialized instruction protocols and aligned with human legal interpretation preferences, all models in the SaulLM family support open collaboration through a permissive MIT license, fostering innovation within the legal AI community 12356.
Specifications(6 models)
| Model | Released | Parameters |
|---|---|---|
| Saul 141B | 2024-07 | 141B |
| Saul 54B | 2024-07 | 54B |
| Saul 141B Instruct | 2024-07 | 141B |
| Saul 54B Instruct | 2024-07 | 54B |
| Saul 7B | 2024-02 | 7B |
| Saul 7B Instruct | 2024-02 | 7B |
Frequently Asked Questions
- What is SaulLM?
- The SaulLM family is a collection of large language models specifically crafted for the legal domain, with the foundational SaulLM-7B model comprising 7 billion parameters. Initially trained on an extensive English legal corpus of over 30 billion tokens, the SaulLM-7B model was further refined through advanced pretraining and instruction fine-tuning to produce SaulLM-7B-Instruct, which is optimized for instruction-following tasks within the legal sector. Following the original model's success, the family expanded with the introduction of larger models like SaulLM-54B and SaulLM-141B, incorporating 54 billion and 141 billion parameters, respectively. These models feature the Mixtral architecture and focus on refined domain adaptation, including continued pretraining on a vast legal dataset exceeding 540 billion tokens. Deployed with specialized instruction protocols and aligned with human legal interpretation preferences, all models in the SaulLM family support open collaboration through a permissive MIT license, fostering innovation within the legal AI community 12356.
- How many models are in the SaulLM family?
- The SaulLM family contains 6 models.
- What is the latest SaulLM model?
- The latest model is Saul 141B, released in 2024-07.
