LLM ReferenceLLM Reference

Qwen2.5 Math RM 72B

About

Reward model variant for Qwen2.5-Math optimized for RLHF pipeline accuracy.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution

Rankings

Specifications

Released2024-09-19
Parameters72B
Context128K
ArchitectureDecoder Only
Specializationreward
Trainingfinetuning
Fine-tuningbase

Created by

AI research institute of Alibaba Group.

Hangzhou, Zhejiang, China
Founded 2017
Website