Qwen2.5 Math RM 72B
About
Reward model variant for Qwen2.5-Math optimized for RLHF pipeline accuracy.
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution
Specifications
FamilyQwen2.5 Math
Released2024-09-19
Parameters72B
Context128K
ArchitectureDecoder Only
Specializationreward
Trainingfinetuning
Fine-tuningbase