Skip to content

Conversation

@malhotra5
Copy link
Collaborator

@malhotra5 malhotra5 commented Jan 14, 2026

Summary

Adds support for injecting custom JavaScript into browser sessions via CDP's Page.addScriptToEvaluateOnNewDocument. This enables session recording tools like rrweb to capture agent browser interactions.

Changes

  • Added inject_scripts parameter to BrowserToolExecutor constructor
  • Added set_inject_scripts() and _inject_scripts_to_session() methods to CustomBrowserUseServer
  • Scripts are injected after browser session initialization and run before page scripts on every new document

Usage

from openhands.tools.browser_use import BrowserToolExecutor

RRWEB_SCRIPT = """
(function() {
    var s = document.createElement('script');
    s.src = 'https://cdn.jsdelivr.net/npm/@rrweb/record@latest/dist/record.umd.min.cjs';
    document.head.appendChild(s);
})();
"""

executor = BrowserToolExecutor(
    inject_scripts=[RRWEB_SCRIPT]
)

Closes #1724

@malhotra5 can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:4ae620e-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-4ae620e-python \
  ghcr.io/openhands/agent-server:4ae620e-python

All tags pushed for this build

ghcr.io/openhands/agent-server:4ae620e-golang-amd64
ghcr.io/openhands/agent-server:4ae620e-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:4ae620e-golang-arm64
ghcr.io/openhands/agent-server:4ae620e-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:4ae620e-java-amd64
ghcr.io/openhands/agent-server:4ae620e-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:4ae620e-java-arm64
ghcr.io/openhands/agent-server:4ae620e-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:4ae620e-python-amd64
ghcr.io/openhands/agent-server:4ae620e-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:4ae620e-python-arm64
ghcr.io/openhands/agent-server:4ae620e-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:4ae620e-golang
ghcr.io/openhands/agent-server:4ae620e-java
ghcr.io/openhands/agent-server:4ae620e-python

About Multi-Architecture Support

  • Each variant tag (e.g., 4ae620e-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 4ae620e-python-amd64) are also available if needed

openhands-agent and others added 17 commits January 14, 2026 22:22
Add inject_scripts parameter to BrowserToolExecutor to allow injecting
custom JavaScript into every new document via CDP's
Page.addScriptToEvaluateOnNewDocument.

This enables session recording tools like rrweb to be injected into
browser sessions for recording agent interactions.

Co-authored-by: openhands <openhands@all-hands.dev>
- Always inject rrweb loader script on browser session init
- Add start_recording() method that calls rrweb.record()
- Add stop_recording() method that stops recording and returns events as JSON
- Add BrowserStartRecordingAction/Tool and BrowserStopRecordingAction/Tool
- Recording uses CDP Runtime.evaluate to execute JS in page context

Co-authored-by: openhands <openhands@all-hands.dev>
- Add unit tests for start_recording and stop_recording action routing
- Add E2E tests for recording functionality:
  - test_start_recording: verify recording can be started
  - test_recording_captures_events: verify events are captured
  - test_recording_save_to_file: verify recording JSON can be saved
- Update test_browser_toolset.py to expect 14 tools (including recording tools)
- Fix rrweb loader script to use correct CDN URL and add fallback stub
- Fix rrweb.record reference (UMD exports to window.rrweb not rrwebRecord)

Co-authored-by: openhands <openhands@all-hands.dev>
Add example script demonstrating how to use the browser session
recording feature with rrweb:

- Shows how to start/stop recording using browser_start_recording
  and browser_stop_recording tools
- Demonstrates browsing multiple sites while recording
- Saves recording to JSON file for later replay
- Includes instructions for replaying with rrweb-player

Co-authored-by: openhands <openhands@all-hands.dev>
Recording improvements:
- Add automatic retry (10 attempts, 500ms delay) when rrweb isn't loaded
- Improve fallback stub to capture actual DOM content:
  - Full DOM serialization in FullSnapshot event
  - MutationObserver for incremental snapshots
  - Scroll and mouse event listeners
- Add event_types summary in stop_recording response
- Add using_stub flag to indicate if fallback was used
- Improved logging for recording start/stop

Test improvements:
- Simplified tests since retry is now built-in
- Added event_types verification in tests
- Added stub status reporting

Co-authored-by: openhands <openhands@all-hands.dev>
Root cause: jsdelivr CDN returns Content-Type: application/node for .cjs files,
which browsers refuse to execute as JavaScript.

The .min.js alternative from jsdelivr uses ES module format which doesn't
create a global window.rrweb object.

Solution: Switch to unpkg CDN which returns Content-Type: text/javascript
for .cjs files, allowing browsers to execute the UMD bundle correctly.

Co-authored-by: openhands <openhands@all-hands.dev>
Recording now continues across page navigations by:
1. Flushing events from browser to Python storage before navigation
2. Automatically restarting recording on the new page after navigation
3. Combining all events when stop_recording is called

Changes:
- Add _recording_events list on Python side to store events
- Add _flush_recording_events() to save browser events before navigation
- Add _restart_recording_on_new_page() to resume recording after navigation
- Update navigate(), go_back(), click() to flush before navigation
- Update _stop_recording() to combine events from all pages
- Add pages_recorded count to stop_recording response

Co-authored-by: openhands <openhands@all-hands.dev>
Changes:
- _stop_recording now saves events to a timestamped JSON file instead of
  returning the full events array to the agent
- Recording file saved to full_output_save_dir (e.g., browser_recording_20260115_001313.json)
- Returns concise message: 'Recording stopped. Captured X events from Y page(s). Saved to: path'
- File contains both events array and metadata (count, pages, event_types, etc.)
- Fixed bug in event type counting (was using type_num instead of type_name)

Co-authored-by: openhands <openhands@all-hands.dev>
- Set persistence_dir on Conversation so recordings are saved
- Update prompt to reflect auto-save behavior (no need to manually save)
- Add RECORDING_DIR variable to show where recordings go

Co-authored-by: openhands <openhands@all-hands.dev>
When rrweb fails to load from CDN, instead of using a minimal fallback
stub that provides degraded functionality, now we:

1. Set a __rrweb_load_failed flag when CDN load fails
2. Check this flag when starting recording
3. Return a clear error message to the agent explaining that recording
   could not be started due to CDN load failure

This simplifies the code and makes failures explicit rather than silently
degrading functionality.

Co-authored-by: openhands <openhands@all-hands.dev>
- Online viewer: https://www.rrweb.io/demo/
"""

import glob
if executor:
try:
executor.close()
except Exception:
Changes:
- Flush events every 5 seconds (RECORDING_FLUSH_INTERVAL_SECONDS)
- Also flush when events exceed 1 MB (RECORDING_FLUSH_SIZE_MB)
- Save events to numbered JSON files (1.json, 2.json, etc.) instead of
  appending to a single file
- Move save_dir parameter from stop_recording to start_recording
- Add background task for periodic flushing
- Track total events and file count across the recording session

This improves performance by:
1. Avoiding memory buildup during long recording sessions
2. Writing smaller, incremental files instead of one large file
3. Spreading I/O across the recording duration

Co-authored-by: openhands <openhands@all-hands.dev>
Move all inline JavaScript code to named constants at the top of server.py
for better readability and maintainability:

- RRWEB_LOADER_JS: Script injected into every page to load rrweb from CDN
- FLUSH_EVENTS_JS: Collects and clears events from browser
- START_RECORDING_SIMPLE_JS: Start recording (used after navigation)
- START_RECORDING_JS: Start recording with load failure check
- STOP_RECORDING_JS: Stop recording and collect remaining events

Also reorganized the file with clear section headers for:
- Configuration Constants
- Injected JavaScript Code
- CustomBrowserUseServer Class

Co-authored-by: openhands <openhands@all-hands.dev>
When saving events to numbered JSON files, check if the file already
exists and increment the counter until an unused filename is found.
This handles cases where files already exist from previous recordings
in the same directory.

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feat: record agent's browser sessions

3 participants