LLM ReferenceLLM Reference
activeHolistic

HELM (Holistic Evaluation of Language Models)

Metric: Multiple metricsIntroduced: 2022

About

Stanford framework evaluating LLMs across 30+ scenarios spanning 7 dimensions: accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency. NOTE: slug contains parentheses — recommend renaming to 'helm'.