fix: resolve race condition between REST sync and WebSocket connection #1790

neubig · 2026-01-22T12:04:22Z

Summary

This PR fixes an initialization race condition in RemoteConversation where events could be missed if they were created between the REST sync and WebSocket connection establishment.

Problem

When RemoteConversation was initialized, the following sequence occurred:

REST sync fetches existing events
(GAP) Events created here are missed!
WebSocket connects and starts receiving new events

Any events created in the gap between steps 1 and 3 would be lost.

Solution

Reorder the initialization to ensure the WebSocket is connected before doing the REST sync:

Start WebSocket client
Wait for connection (5s timeout with graceful degradation)
Perform REST sync

This ensures:

Events created after WebSocket connects are received via WebSocket
Events created before are captured by the REST sync
No events are missed in the gap

Changes

WebSocketCallbackClient: Added _connected event and wait_for_connection() method
RemoteEventsList/RemoteState: Added auto_sync parameter to al- RemoteEventsList/RemoteState: Added auto_sync parameter to al- RemoteEventsList/RemoteState: Added auto_sync parameter to al- RemoteEventsList/RemoteState: Added auit_for_connection

Test Results

All 277 conversation tests pass
Added 3 new tests for wait_for_connection behavior

Co-authored-by: openhands openhands@all-hands.dev

Add support for AgentSkills standard fields (https://agentskills.io/specification): - description: Brief description of what the skill does - license: License under which the skill is distributed - compatibility: Environment requirements or compatibility notes - metadata: Arbitrary key-value metadata for extensibility - allowed_tools: List of pre-approved tools for the skill Also adds skills-ref as an optional dependency for future validation and prompt generation utilities. Closes #1474 Co-authored-by: openhands <openhands@all-hands.dev>

The skills-ref library will be added when validation and prompt generation utilities are implemented (issue #1478). Co-authored-by: openhands <openhands@all-hands.dev>

- Add find_skill_md() function to locate SKILL.md files (case-insensitive) - Add validate_skill_name() function for AgentSkills spec validation - Update load_skills_from_dir() to support skill-name/SKILL.md directories - Add directory_name and validate_name parameters to Skill.load() - Export new functions from __init__.py - Add 27 unit tests for new functionality Closes #1475 Co-authored-by: openhands <openhands@all-hands.dev>

Reduce test code while maintaining essential coverage. Co-authored-by: openhands <openhands@all-hands.dev>

Resolved merge conflicts in: - openhands-sdk/openhands/sdk/context/skills/skill.py - tests/sdk/context/skill/test_agentskills_fields.py The resolution keeps both: 1. Pydantic field validators for allowed_tools and metadata from main 2. Skill name validation logic from this branch 3. SKILL.md convention support from this branch Co-authored-by: openhands <openhands@all-hands.dev>

Extract helper functions to simplify the load_skills_from_dir function: - _find_third_party_files: Find .cursorrules, AGENTS.md, etc. in repo root - _find_skill_md_directories: Find AgentSkills-style SKILL.md directories - _find_regular_md_files: Find regular .md files excluding SKILL.md dirs - _load_skill_safe: Load skills with consistent error handling This improves code readability and maintainability by following the single responsibility principle. Each helper function handles one specific aspect of skill discovery or loading. Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

- Remove redundant tuple element from _find_skill_md_directories (directory_path can be derived from skill_md.parent) - Replace _load_skill_safe wrapper with _load_and_categorize that combines loading and categorization in one function - Auto-validate skill name when directory_name is provided (removed separate validate_name parameter) - Fix case-insensitive search for third-party files to iterate over all files instead of checking specific variants - Update tests to check for specific error messages instead of magic number assertions Co-authored-by: openhands <openhands@all-hands.dev>

SKILL.md directories should always be categorized as knowledge_skills (progressive loading), not repo_skills (permanent context), even when they have no triggers defined. This addresses enyst's feedback that AgentSkills are fundamentally different from permanent OH skills like repo.md - they should use progressive loading while permanent skills should use AGENTS.md. Changes: - Modified _load_and_categorize to always put SKILL.md files in knowledge_skills when directory_name is provided - Added test_skill_md_always_knowledge_skill to verify the behavior Co-authored-by: openhands <openhands@all-hands.dev>

The file_content parameter was only used in unit tests. Tests have been updated to use pytest's tmp_path fixture and write content to actual temp files instead. Co-authored-by: openhands <openhands@all-hands.dev>

…gacy formats Split load() into: - _load_agentskills_skill(): For SKILL.md files (AgentSkills format) - _load_legacy_openhands_skill(): For legacy OpenHands skill files - _create_skill_from_metadata(): Shared helper for Skill object creation Co-authored-by: openhands <openhands@all-hands.dev>

- load_skills_from_dir() now returns 3 dictionaries: repo_skills, knowledge_skills, agent_skills - AgentSkills (SKILL.md directories) are categorized into agent_skills (separate from OpenHands skills) - Updated all callers to handle the new return type - Updated tests to verify the new categorization This addresses enyst's review comment about keeping AgentSkills separate from OpenHands skills, as they follow different standards and loading patterns. Co-authored-by: openhands <openhands@all-hands.dev>

Add support for .mcp.json files in AgentSkills directories (SKILL.md format), following the AgentSkills standard for MCP server configuration. Changes: - Add _find_mcp_config() to locate .mcp.json files in skill directories - Add _expand_mcp_variables() for variable expansion (${VAR}, ${VAR:-default}) - Add _load_mcp_config() to load and validate .mcp.json files - Update _load_agentskills_skill() to load .mcp.json (agent_skills only) - Update _load_legacy_openhands_skill() to load mcp_tools from frontmatter MCP loading rules: - AgentSkills (SKILL.md): ONLY use .mcp.json, ignore mcp_tools frontmatter - Legacy skills (.md): ONLY use mcp_tools frontmatter, no .mcp.json support Co-authored-by: openhands <openhands@all-hands.dev>

Resolved merge conflicts by keeping the .mcp.json loading functionality from this branch while incorporating main's changes (PR #1480). Key changes preserved: - AgentSkills (SKILL.md) load .mcp.json with variable expansion - Legacy skills load mcp_tools from frontmatter only - _find_mcp_config, _expand_mcp_variables, _load_mcp_config functions Co-authored-by: openhands <openhands@all-hands.dev>

Use directory_name consistently as in main branch. Co-authored-by: openhands <openhands@all-hands.dev>

This commit fixes an initialization race condition in RemoteConversation where events could be missed if they were created between the REST sync and WebSocket connection establishment. Changes: - Add wait_for_connection() method to WebSocketCallbackClient with _connected threading.Event to signal connection state - Add auto_sync parameter to RemoteEventsList and RemoteState to allow deferring the initial sync - Reorder RemoteConversation initialization to: 1. Start WebSocket client 2. Wait for connection (5s timeout with graceful degradation) 3. Then perform REST sync - Add autouse fixture in conversation tests to mock WebSocket client - Add unit tests for wait_for_connection behavior The fix ensures that when the WebSocket is connected before the REST sync, any events created after the WebSocket connects will be received via WebSocket, and any events before will be captured by the REST sync. Co-authored-by: openhands <openhands@all-hands.dev>

openhands-agent and others added 21 commits December 22, 2025 15:55

chore: remove unused agentskills optional dependency

c033966

The skills-ref library will be added when validation and prompt generation utilities are implemented (issue #1478). Co-authored-by: openhands <openhands@all-hands.dev>

refactor: consolidate AgentSkills tests

b5e571f

Reduce test code while maintaining essential coverage. Co-authored-by: openhands <openhands@all-hands.dev>

Merge main into feat/skill-md-convention

b33afe8

Co-authored-by: openhands <openhands@all-hands.dev>

Merge branch 'main' into feat/skill-md-convention

5649c81

Update organization

feb7a2d

Remove file_content parameter from Skill.load()

e86afb8

The file_content parameter was only used in unit tests. Tests have been updated to use pytest's tmp_path fixture and write content to actual temp files instead. Co-authored-by: openhands <openhands@all-hands.dev>

Merge remote-tracking branch 'origin/main' into feat/skill-md-convention

821f5b5

fix: remove duplicate path.parent.name assignment

bba56fa

Use directory_name consistently as in main branch. Co-authored-by: openhands <openhands@all-hands.dev>

Merge branch 'main' into feat/mcp-json-support

e0963ec

Merge branch 'main' into feat/mcp-json-support

3f77eee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve race condition between REST sync and WebSocket connection #1790

fix: resolve race condition between REST sync and WebSocket connection #1790

Uh oh!

neubig commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: resolve race condition between REST sync and WebSocket connection #1790

Are you sure you want to change the base?

fix: resolve race condition between REST sync and WebSocket connection #1790

Uh oh!

Conversation

neubig commented Jan 22, 2026

Summary

Problem

Solution

Changes

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants