LLM Reference

About

The Tulu v2.5 suite, created by the Allen Institute for AI, is a robust collection of large language models (LLMs) that leverage advanced machine learning strategies such as Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO) 124. These models are specifically tailored to enhance performance in text generation, instruction following, and reasoning tasks. By incorporating preference datasets, Tulu v2.5 models prioritize responses that closely align with human expectations 24. The suite includes several variants trained on diverse datasets like UltraFeedback and HH-RLHF, allowing for task-specific optimizations in truthfulness, safety, coding, and reasoning 24. Available on Hugging Face, these models highlight the strengths of both offline and online reinforcement learning approaches 5.