LLM Reference
Fireworks AI

Snorkel Mistral PairRM on Fireworks AI

Snorkel · Snorkel AI

Provisioned

Pricing

TypePrice (per 1M)
Input tokens$0.20
Output tokens$0.20

Capabilities

VisionMultimodalReasoningFunction CallingTool UseJSON ModeCode Execution

About Snorkel Mistral PairRM

The Snorkel Mistral PairRM-DPO is a chat-optimized large language model, leveraging the Mistral-7B-Instruct-v0.2 architecture. Designed to interpret and respond efficiently to user inputs, it employs Direct Preference Optimization alongside the Pairwise Reward Model (PairRM) to enhance its alignment with human preferences. Exclusively trained on the UltraFeedback dataset without input from other LLMs, it excels in generating text for conversational contexts, ranking third on the AlpacaEval 2.0 leaderboard at 30.22. Post-processing with PairRM-best-of-16 boosts its score to 34.86. Despite its strengths, the model has limitations, including the absence of moderation features, a possible bias towards longer responses influenced by the evaluation benchmark, and challenges in understanding its complex internal mechanics.

Get Started

Model Specs

Released2023-11-15
ArchitectureDecoder Only