Coding
SWE-bench Verified
About
Real-world GitHub issue resolution benchmark with 500 human-verified tasks. Measures end-to-end software engineering ability — the percentage of issues resolved correctly by an AI agent.
Real-world GitHub issue resolution benchmark with 500 human-verified tasks. Measures end-to-end software engineering ability — the percentage of issues resolved correctly by an AI agent.