LLM ReferenceLLM Reference
MT-BenchactiveArena

MT-Bench

Metric: MT-Bench Score (1-10) (higher is better)Introduced: 2023

About

80 multi-turn conversation questions across 8 categories evaluated by GPT-4 as judge. Scores range from 1-10 per turn.