LLM Reference
RedPajama

RedPajama

About

The RedPajama family of large language models (LLMs) represents an open-source initiative focused on developing high-performing and transparent models, spearheaded by Together AI in collaboration with leading figures in the open-source AI community 83. These models are trained on the extensive RedPajama dataset, encompassing over 100 trillion raw tokens, and a refined subset of 30 trillion tokens across various languages and domains 8. They are available in multiple sizes and configurations, such as base models, instruction-tuned versions for enhanced few-shot learning, and chat models tailored for interactive dialogues 38. An exemplar model, the RedPajama-INCITE-Instruct-3B-v1, is particularly optimized for few-shot applications using GPT-JT data, deliberately excluding tasks overlapping with HELM core scenarios 3. The initiative not only prioritizes model performance but also the transparency and accessibility of data and training methodologies 8.

Models(2)

Details

ResearcherTogether.ai
Models2