LLM ReferenceLLM Reference
BBHactiveComposite

BIG-Bench Hard

Metric: Accuracy (higher is better)Introduced: 2022

About

23 challenging tasks from BIG-Bench where even large models scored below human performance without chain-of-thought. Tests logical deduction, causal judgment, and formal fallacies.