LLM ReferenceLLM Reference
HumanEvalactiveCoding

HumanEval

Metric: Pass@1 (higher is better)Introduced: 2021

About

164 Python coding problems measuring functional correctness of code generation via pass@k metric. Released by OpenAI in 2021; HumanEval+ provides a more rigorous extension.