LLM ReferenceLLM Reference
HellaSwagactiveReasoning

HellaSwag

Metric: Accuracy (higher is better)Introduced: 2019

About

Commonsense sentence-completion benchmark using adversarially filtered wrong answers. Top LLMs now exceed 95% accuracy.