LLM Reference

Snorkel Mistral PairRM

About

The Snorkel Mistral PairRM-DPO is a chat-optimized large language model, leveraging the Mistral-7B-Instruct-v0.2 architecture. Designed to interpret and respond efficiently to user inputs, it employs Direct Preference Optimization alongside the Pairwise Reward Model (PairRM) to enhance its alignment with human preferences. Exclusively trained on the UltraFeedback dataset without input from other LLMs, it excels in generating text for conversational contexts, ranking third on the AlpacaEval 2.0 leaderboard at 30.22. Post-processing with PairRM-best-of-16 boosts its score to 34.86. Despite its strengths, the model has limitations, including the absence of moderation features, a possible bias towards longer responses influenced by the evaluation benchmark, and challenges in understanding its complex internal mechanics.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Providers(2)

ProviderInput (per 1M)Output (per 1M)Type
Together AI API$0.2$0.2
Serverless
Fireworks AI Platform
Provisioned

Specifications

FamilySnorkel
ArchitectureDecoder Only
Specializationgeneral