LLM ReferenceLLM Reference

Snorkel Mistral PairRM

About

The Snorkel Mistral PairRM-DPO is a chat-optimized large language model, leveraging the Mistral-7B-Instruct-v0.2 architecture. Designed to interpret and respond efficiently to user inputs, it employs Direct Preference Optimization alongside the Pairwise Reward Model (PairRM) to enhance its alignment with human preferences. Exclusively trained on the UltraFeedback dataset without input from other LLMs, it excels in generating text for conversational contexts, ranking third on the AlpacaEval 2.0 leaderboard at 30.22. Post-processing with PairRM-best-of-16 boosts its score to 34.86. Despite its strengths, the model has limitations, including the absence of moderation features, a possible bias towards longer responses influenced by the evaluation benchmark, and challenges in understanding its complex internal mechanics.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution

Providers(2)

Compare all →
ProviderInput (per 1M)Output (per 1M)Type
Together AI$0.2$0.2Serverless
Fireworks AI$0.20$0.20Provisioned

Rankings

Specifications

FamilySnorkel
Released2023-11-15
ArchitectureDecoder Only
Specializationgeneral
Trainingfinetuning

Created by

Programmatic data labeling accelerates AI

Redwood City, California, United States
Founded 2019
Website