HellaSwagactiveReasoning
HellaSwag
Metric: Accuracy (higher is better)Introduced: 2019
About
Commonsense sentence-completion benchmark using adversarially filtered wrong answers. Top LLMs now exceed 95% accuracy.
Commonsense sentence-completion benchmark using adversarially filtered wrong answers. Top LLMs now exceed 95% accuracy.