Skip to content

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Jan 22, 2026

Summary

This PR refactors the vision_enabled field on Message objects by deprecating it as a stored field and passing it as a parameter to serialization methods instead.

Problem

The vision_enabled field was being stored on each Message object, but this created several issues:

  1. Semantic mismatch: Whether vision is enabled is a property of the LLM configuration, not of individual messages
  2. Storage bloat: The field was being serialized to storage unnecessarily
  3. Confusion: The field's purpose was unclear and it was set at message creation time based on _supports_vision() state

Solution

  • to_chat_dict() now accepts an optional vision_enabled: bool | None = None parameter
  • _list_serializer() now requires vision_enabled: bool as a keyword-only parameter
  • format_messages_for_llm() now passes vision_enabled=self._supports_vision() to the serializer
  • Added deprecation warning for the field (deprecated in 1.10.0, to be removed in 1.11.0)
  • Maintained backward compatibility for messages that already have the field stored

Changes

openhands-sdk/openhands/sdk/llm/message.py

  • Updated to_chat_dict() to accept optional vision_enabled parameter
  • Updated _list_serializer() to require vision_enabled as keyword-only parameter
  • Added deprecation warning when vision_enabled field is used directly

openhands-sdk/openhands/sdk/llm/llm.py

  • Updated format_messages_for_llm() to pass vision_enabled=self._supports_vision()

Test files updated

  • test_message_serialization.py - updated calls to _list_serializer()
  • test_thinking_blocks.py - updated calls to _list_serializer()
  • test_message.py - updated calls to _list_serializer()

Testing

All 495 tests pass.

Related

This is a follow-up to PR #1795 which fixed vision detection for proxy model names. During investigation of that issue, it was discovered that the vision_enabled field design could be improved.

Migration Guide

For users directly calling _list_serializer():

# Before
messages = Message._list_serializer(msg_list)

# After
messages = Message._list_serializer(msg_list, vision_enabled=True)

For users relying on stored vision_enabled field:
The field is deprecated and will be removed in v1.11.0. The field value is still respected for backward compatibility, but new code should pass vision_enabled as a parameter to serialization methods.

@neubig can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:b47690c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-b47690c-python \
  ghcr.io/openhands/agent-server:b47690c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:b47690c-golang-amd64
ghcr.io/openhands/agent-server:b47690c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:b47690c-golang-arm64
ghcr.io/openhands/agent-server:b47690c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:b47690c-java-amd64
ghcr.io/openhands/agent-server:b47690c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:b47690c-java-arm64
ghcr.io/openhands/agent-server:b47690c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:b47690c-python-amd64
ghcr.io/openhands/agent-server:b47690c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:b47690c-python-arm64
ghcr.io/openhands/agent-server:b47690c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:b47690c-golang
ghcr.io/openhands/agent-server:b47690c-java
ghcr.io/openhands/agent-server:b47690c-python

About Multi-Architecture Support

  • Each variant tag (e.g., b47690c-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., b47690c-python-amd64) are also available if needed

…r instead

This refactoring moves vision_enabled from being a stored field on Message
to being a parameter passed during serialization. This is a cleaner design
because vision capability is an LLM property, not a message property.

Changes:
- to_chat_dict() now accepts optional vision_enabled parameter
- _list_serializer() now requires vision_enabled parameter
- format_messages_for_llm() now passes vision_enabled during serialization
- Added deprecation notice to vision_enabled field (deprecated in 1.10.0,
  removed in 1.11.0)
- Backward compatibility maintained: if vision_enabled parameter is None,
  falls back to the field value (with deprecation warning at version 1.10.0+)

This aligns to_chat_dict() with to_responses_value() which already takes
vision_enabled as a parameter.

Benefits:
- Cleaner separation of concerns (message content vs serialization context)
- No need to set fields on messages before serializing
- Eliminates potential for stored state to get out of sync with actual
  LLM capabilities

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm
   llm.py4026484%349, 370–371, 407, 571, 672, 700, 774–779, 899, 902–905, 1038, 1060–1061, 1070, 1083, 1085–1090, 1092–1109, 1112–1116, 1118–1119, 1125–1134
   message.py284897%396, 409–410, 418, 460, 556, 685–686
TOTAL16323478870% 

@neubig
Copy link
Contributor Author

neubig commented Jan 23, 2026

Closing in favor of #1798

@neubig neubig closed this Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants