activeHolistic
HELM (Holistic Evaluation of Language Models)
Metric: Multiple metricsIntroduced: 2022
About
Stanford framework evaluating LLMs across 30+ scenarios spanning 7 dimensions: accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency. NOTE: slug contains parentheses — recommend renaming to 'helm'.