Link preview
SWE-Marathon
20 multi-hour SWE tasks spanning library reproductions, full-stack product clones, and ML engineering. 1,300 logged trials; frontier configs stay below 19% task resolution. SWE-Marathon · swe-marathon.org
20 multi-hour SWE tasks spanning library reproductions, full-stack product clones, and ML engineering. 1,300 logged trials; frontier configs stay below 19% task resolution.
Comments