LLM ReferenceLLM Reference
1,140 complex Python programming tasksCoding

BigCodeBench

About

Challenging code generation benchmark with 1,140 Python tasks spanning diverse real-world domains and libraries. Tasks require calling multiple library functions and combining complex logic. Two variants: 'Complete' (function completion from docstring) and 'Instruct' (natural language to code). Score is Pass@1.