LLM Reference
BigScience

BigScience

Pioneering open-source AI collaboration

Collaboration

About

BigScience is a pioneering collaborative research initiative that has significantly advanced the field of large language models (LLMs) and generative AI. Unlike many projects dominated by corporate secrecy, BigScience stands out for its commitment to open science, inclusivity, and ethical considerations. This initiative originated from discussions facilitated by Hugging Face, GENCI, and IDRIS, aiming to produce open-source LLMs and datasets that not only serve academic and practical use but also explore the broader societal impacts of AI. This philosophy marks a stark contrast to the closed-source practices often seen in the tech industry. At its core, BigScience aims to democratize access to powerful AI tools by fostering a more inclusive research environment. The project's dedication to openness is embodied in its multilingual LLM and dataset, both available to researchers globally without charge. This initiative brought together over 1,000 researchers from 60 countries and more than 250 institutions, highlighting an interdisciplinary collaboration rare in the industry. This diversity of expertise, spanning fields such as computer science and social sciences, ensures comprehensive consideration of the ethical and social implications of LLMs. The methodology behind BigScience involves establishing specialized working groups to tackle different facets of LLM development, from sourcing data to designing the model architecture. A key focus has been on data diversity, actively including data from underrepresented texts to combat existing biases. The culmination of these efforts is ROOTS, a massive multilingual dataset that was instrumental in training BLOOM, a 176-billion-parameter LLM capable of producing text in a wide range of natural and programming languages. The release of BLOOM represents a significant milestone for the open-source AI community, providing unprecedented access to a powerful open LLM for global researchers and developers. Ethical considerations are central to BigScience's mission, evident from its establishment of an ethical charter that outlines values such as inclusivity, diversity, and responsibility. The development of a Responsible AI License (RAIL) further underscores its commitment to ethically responsible AI use. By incorporating ethical practices from the outset, BigScience differentiates itself from other AI initiatives that treat responsibility as an afterthought. The initiative has also contributed extensively to the AI research community through publications that share methodologies, findings, and gathered insights, providing valuable lessons on conducting responsible and inclusive AI research.

Model Families