LLM ReferenceLLM Reference
activeCoding

SWE-bench Pro

Metric: % Resolved (higher is better)Introduced: 2025

About

731-task multilingual real-world GitHub issue benchmark extending SWE-bench Verified with harder, more diverse tasks across Python, JavaScript, TypeScript, Java, Go, C++, and Rust.