HumanEvalactiveCoding
HumanEval
Metric: Pass@1 (higher is better)Introduced: 2021
About
164 Python coding problems measuring functional correctness of code generation via pass@k metric. Released by OpenAI in 2021; HumanEval+ provides a more rigorous extension.