activeCoding
SWE-bench Pro
Metric: % Resolved (higher is better)Introduced: 2025
About
731-task multilingual real-world GitHub issue benchmark extending SWE-bench Verified with harder, more diverse tasks across Python, JavaScript, TypeScript, Java, Go, C++, and Rust.