fix: improve vision detection for proxy model names #1795

neubig · 2026-01-22T17:40:14Z

Summary

This PR fixes a critical bug in vision detection that was causing images to be stripped from multimodal evaluations when using proxy model names.

Problem

For proxy model names like litellm_proxy/openai/gpt-4o or litellm_proxy/anthropic/claude-opus-4-5-20251101:

litellm.supports_vision("litellm_proxy/openai/gpt-4o")  # → False ❌
litellm.supports_vision("openai/gpt-4o")                # → True ✅
litellm.supports_vision("gpt-4o")                        # → True ✅

The previous code only tried the full path and the last segment (model name only), but missed the provider/model format which is what litellm recognizes for many models.

This was causing vision_is_active() to return False for vision-capable models accessed through evaluation proxies, resulting in images being stripped from messages in multimodal benchmarks like SWE-bench Multimodal.

Solution

Updated _supports_vision() to try multiple model name variants:

Full model name: litellm_proxy/anthropic/claude-opus-4-5-20251101
Provider/model format: anthropic/claude-opus-4-5-20251101 ← This was missing
Just model name: claude-opus-4-5-20251101

Testing

All 18 existing vision tests pass
Added 3 new test cases for litellm_proxy/openai/* and litellm_proxy/gemini/* formats

Impact

This fix should improve SWE-bench Multimodal scores by ensuring images are actually sent to the model when using proxy configurations in evaluations.

ghcr.io/openhands/agent-server:602b6d6-golang-amd64
ghcr.io/openhands/agent-server:602b6d6-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:602b6d6-golang-arm64
ghcr.io/openhands/agent-server:602b6d6-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:602b6d6-java-amd64
ghcr.io/openhands/agent-server:602b6d6-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:602b6d6-java-arm64
ghcr.io/openhands/agent-server:602b6d6-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:602b6d6-python-amd64
ghcr.io/openhands/agent-server:602b6d6-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:602b6d6-python-arm64
ghcr.io/openhands/agent-server:602b6d6-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:602b6d6-golang
ghcr.io/openhands/agent-server:602b6d6-java
ghcr.io/openhands/agent-server:602b6d6-python

About Multi-Architecture Support

Each variant tag (e.g., 602b6d6-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 602b6d6-python-amd64) are also available if needed

When using models through a proxy (e.g., litellm_proxy/openai/gpt-4o), the vision detection was failing because litellm.supports_vision() returns False for the full proxy path but True for the provider/model format. This fix tries multiple model name variants: 1. Full model name (litellm_proxy/openai/gpt-4o) 2. Provider/model format (openai/gpt-4o) 3. Just the model name (gpt-4o) This ensures vision support is correctly detected for models accessed through evaluation proxies like litellm_proxy. Added test cases for litellm_proxy/openai/* and litellm_proxy/gemini/* formats that are commonly used in evaluations. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-01-22T17:43:35Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/llm
llm.py	427	65	84%	350, 371–372, 408, 572, 673, 701, 775–780, 900, 903–906, 952, 1075, 1108–1109, 1118, 1131, 1133–1138, 1140–1157, 1160–1164, 1166–1167, 1173–1182
TOTAL	16339	4789	70%

all-hands-bot

Overall Assessment: This PR correctly fixes the vision detection bug for proxy model names. The implementation is clean, the logic is sound, and test coverage is appropriate. I have one minor suggestion for comment clarity below.

openhands-sdk/openhands/sdk/llm/llm.py

juanmichelini · 2026-01-22T19:53:14Z

Does not seem to work, testing this on https://storage.googleapis.com/openhands-evaluation-results/eval-21259253300-gemini-3-p_litellm_proxy-gemini-gemini-3-pro-preview_26-01-22-18-22.tar.gz

Result

vision_enabled: false
Images present: Yes (1 image in messages)
Model: litellm_proxy/gemini/gemini-3-pro-preview (supports vision)
Benchmark: swebenchmultimodal
SDK commit: e4e6a1e
Test size: Only 1 instance (eval_limit: 1)

enyst

Thank you for looking into this, I was sure there is scope to improve!

I do have a little question though: is it possible to use canonical_name from LLM class? I believe we have added it relatively recently in order to account for the fact that litellm proxies might be configured with different model names.

Or is that a bad idea, and then maybe we could clean it up if useless?

enyst · 2026-01-22T23:05:53Z

openhands-sdk/openhands/sdk/llm/llm.py

        # remove when litellm is updated to fix https://github.com/BerriAI/litellm/issues/5608  # noqa: E501
-        # Check both the full model name and the name after proxy prefix for vision support  # noqa: E501
+        # Check multiple formats for vision support to handle proxy prefixes like 'litellm_proxy/provider/model'  # noqa: E501
        model_for_caps = self._model_name_for_capabilities()


I believe _model_name_for_capabilities() intended to figure out the "real" model, so that vision ability can be detected correctly.

Sorry, I might not fully understand the problem here, just wonder, is there a reason why we can't use it?

This adds logging to format_messages_for_llm and format_messages_for_responses to explicitly show when vision_enabled=True and images are being included. This helps diagnose the perceived vision_enabled=false issue in CI logs, which actually shows the Message default value before formatting, not the actual state at LLM call time. Co-authored-by: openhands <openhands@all-hands.dev>

neubig · 2026-01-23T01:08:47Z

Investigation Results

After thoroughly analyzing the codebase, I believe I have found the root cause of the perceived vision_enabled=false issue in Datadog logs.

Key Findings

Vision detection IS working correctly for proxy model names like litellm_proxy/gemini/gemini-3-pro-preview. Local tests confirm that:
- llm.vision_is_active() returns True ✓
- Formatted messages include images when sent to the LLM ✓
- The PR's prefix-stripping logic works as expected ✓
The vision_enabled=false in logs is expected behavior:
- The Message class has vision_enabled: bool = False as the default
- When messages are created in benchmarks, they have vision_enabled=False
- Messages are stored in events and persisted with this default value
- The format_messages_for_llm() method sets vision_enabled=True on a DEEP COPY of messages right before the LLM API call
- The original messages in events retain vision_enabled=False
- When these events are logged to Datadog, they show the pre-formatted state
Images ARE being sent to the LLM. The formatted messages (what's actually sent to the LLM API) correctly include the images when vision is supported.

Fix Applied

I've added logging to format_messages_for_llm() and format_messages_for_responses() that explicitly shows:

vision_enabled=True/False at format time
The number of images being included
The model name

This will appear in logs like:

Formatting messages with vision_enabled=True, including 1 image(s) for model litellm_proxy/gemini/gemini-3-pro-preview

This should help confirm in CI logs that vision IS being used, even if the persisted event shows the default vision_enabled=False.

To Further Verify

When running the next CI eval with this commit, look for the new log messages that explicitly confirm vision usage at LLM call time.

enyst · 2026-01-23T09:56:19Z

@OpenHands Read all the comments, install yourself gh if you don’t have it so that it’s easier.

I’m curious about this part of the last comment:

Messages are stored in events and persisted with this default value
The format_messages_for_llm() method sets vision_enabled=True on a DEEP COPY of messages right before the LLM API call
The original messages in events retain vision_enabled=False

Messages are stored in events? Investigate the code and tell me the code that proves the scenario described here.

openhands-ai · 2026-01-23T09:56:31Z

I'm on it! enyst can track my progress at all-hands.dev

openhands-ai · 2026-01-23T09:58:09Z

Summary of work (no code changes made):

Installed GitHub CLI (gh) in the environment via apt so it’s available for repo/PR inspection.
Investigated the SDK code to validate (or refute) the claim that “messages are stored in events and persisted with vision_enabled=False, while format_messages_for_llm sets vision_enabled=True only on a deep copy right before the LLM call.”
Located and cited the exact code paths proving this behavior:
- MessageEvent persists an llm_message: Message in the event stream (openhands-sdk/openhands/sdk/event/llm_convertible/message.py).
- Message.vision_enabled defaults to False (openhands-sdk/openhands/sdk/llm/message.py).
- LLM.format_messages_for_llm() performs copy.deepcopy(messages) and then sets message.vision_enabled = vision_active on the copied messages (openhands-sdk/openhands/sdk/llm/llm.py).
- MessageEvent.to_llm_message() returns a deep copy of the stored llm_message, preventing mutation of the persisted event payload (openhands-sdk/openhands/sdk/event/llm_convertible/message.py).
- Event→message conversion uses LLMConvertibleEvent.events_to_messages(...) which calls event.to_llm_message() (openhands-sdk/openhands/sdk/event/base.py).

Checklist against your request:

Read/investigated relevant code paths in the repo
Installed gh when it wasn’t present
Provided the specific code that proves the described scenario
Did not modify repository code or comment on the PR

Since there were no repository changes, there is no git diff to review and nothing to push.

View full conversation

enyst · 2026-01-23T10:52:03Z

MessageEvent persists an llm_message: Message in the event stream (openhands-sdk/openhands/sdk/event/llm_convertible/message.py).

Ahh. Now I understand the origin of the problem. Totally agree that’s wrong, I think there’s no point in persisting information that became false when actually computed. I think the other PR is fixing it?

neubig marked this pull request as ready for review January 22, 2026 17:43

all-hands-bot reviewed Jan 22, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/llm.py Show resolved Hide resolved

juanmichelini self-requested a review January 22, 2026 18:02

neubig mentioned this pull request Jan 22, 2026

refactor: deprecate vision_enabled field on Message, pass as parameter instead #1797

Closed

juanmichelini approved these changes Jan 22, 2026

View reviewed changes

enyst requested changes Jan 22, 2026

View reviewed changes

enyst reviewed Jan 22, 2026

View reviewed changes

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve vision detection for proxy model names #1795

fix: improve vision detection for proxy model names #1795

neubig commented Jan 22, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

juanmichelini commented Jan 22, 2026

Uh oh!

enyst left a comment

Uh oh!

enyst Jan 22, 2026

Uh oh!

neubig commented Jan 23, 2026

Uh oh!

enyst commented Jan 23, 2026

Uh oh!

openhands-ai bot commented Jan 23, 2026

Uh oh!

openhands-ai bot commented Jan 23, 2026

Uh oh!

enyst commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fix: improve vision detection for proxy model names #1795

Are you sure you want to change the base?

fix: improve vision detection for proxy model names #1795

Conversation

neubig commented Jan 22, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Testing

Impact

Related

Uh oh!

github-actions bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

juanmichelini commented Jan 22, 2026

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

enyst Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

neubig commented Jan 23, 2026

Investigation Results

Key Findings

Fix Applied

To Further Verify

Uh oh!

enyst commented Jan 23, 2026

Uh oh!

openhands-ai bot commented Jan 23, 2026

Uh oh!

openhands-ai bot commented Jan 23, 2026

Uh oh!

enyst commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

neubig commented Jan 22, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Jan 22, 2026 •

edited

Loading