JudgeLMactive
JudgeLM
Metric: Judge Agreement Rate (higher is better)Introduced: 2023
About
Fine-tuned language models trained to be scalable judges for open-ended LLM evaluation, achieving high agreement with human preferences.
Fine-tuned language models trained to be scalable judges for open-ended LLM evaluation, achieving high agreement with human preferences.