Skip to content

SWE-Bench Multimodal #338

@juanmichelini

Description

@juanmichelini

When we run eval_infer in swebenchmultimodal, if no instance was created it crashes.
It shouldn't crash. Instead it should run as usual, letting the harness be responsible for the next course of action.

Log Tail:[01/17/26 21:07:36] INFO     Model name:                       eval_infer.py:270                             litellm_proxy/minimax/minimax-m2                   [01/17/26 21:07:36] INFO     Converting                         eval_infer.py:50                             eval_outputs/princeton-nlp__SWE-be                                              nch_Multimodal-dev/litellm_proxy/m                                              inimax/minimax-m2_sdk_0f4bbd5_maxi                                              ter_500_N_litellm_proxy-minimax-mi                                              nimax-m2/output.jsonl to SWE-Bench                                              format:                                                                         eval_outputs/princeton-nlp__SWE-be                                              nch_Multimodal-dev/litellm_proxy/m                                              inimax/minimax-m2_sdk_0f4bbd5_maxi                                              ter_500_N_litellm_proxy-minimax-mi                                              nimax-m2/output.swebench.jsonl                     [01/17/26 21:07:36] INFO     Conversion complete: 0 entries    eval_infer.py:104                             converted, 0 errors                                [01/17/26 21:07:36] ERROR    Script failed: No valid entries   eval_infer.py:288                             were converted                                     /workspace/benchmarks/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:66: DeprecationWarning: There is no current event loop  loop = asyncio.get_event_loop()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions