Fix OpenHands SDK Modal startup and full-eval reliability#37

Draft

AlienKevin wants to merge 1 commit intocooperbench:mainfrom

AlienKevin:kevin/eval

AlienKevin commented Feb 27, 2026 •

edited

Loading

Summary

preserve OpenHands -oh image entrypoint in the SDK adapter so agent-server starts correctly
avoid repo import shadowing in OpenHands sandboxes by using / workdir
make Modal eval backend use explicit keepalive startup (entrypoint([]) + sleep infinity) for stable sb.exec
update vendored OpenHands SDK/tools code for Python 3.11 compatibility

Why

fixes repeated Modal Sandbox ... not found failures during run/eval
enables end-to-end reproduction of gemini-3-flash + openhands_sdk on full CooperBench

Scope

src/cooperbench/agents/openhands_agent_sdk/adapter.py
src/cooperbench/eval/backends/modal.py
vendored OpenHands SDK/tools compatibility fixes under src/cooperbench/agents/openhands_agent_sdk/

Validation

Reproduced Gemini 3 Flash + OpenHands performance on CooperBench:

{
  "pass_rate": 0.2846153846153846,
  "total_runs": 652,
  "passed": 185,
  "failed": 465,
  "errors": 2,
  "skipped": 0,
}


          Fix OpenHands Modal startup and eval sandbox reliability

4330d40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet