feat(scrape_tool): add support for JavaScript rendering using Playwright by mrsadeghi · Pull Request #4402 · crewAIInc/crewAI

mrsadeghi · 2026-02-07T08:01:57Z

Description

This PR introduces an optional render_js parameter to the ScrapeWebsiteTool, enabling it to handle modern JavaScript-heavy websites (SPAs, React, etc.) that cannot be scraped with standard HTTP requests.

Key Changes

Added render_js: bool = False to ScrapeWebsiteTool constructor.
Integrated playwright as an optional rendering engine.
Implemented a fallback mechanism: if render_js is False, it continues to use the lightweight requests library.
Added comprehensive unit tests with mocking to verify both JS and non-JS workflows.
Updated pyproject.toml to include playwright as a dependency.

Why this is needed?

Currently, the ScrapeWebsiteTool fails to capture content from websites that require client-side rendering. By adding Playwright support, we significantly expand the tool's capability to gather data from modern web applications.

Testing

Manual Test: Verified on https://quotes.toscrape.com/js/.
- Without render_js: Captured ~200 chars (missing content).
- With render_js: Captured ~1500 chars (full content).
Automated Test: Added tests/tools/test_scrape_website_tool.py and passed successfully using uv run pytest.

Note for Maintainers

Due to environment-specific pre-commit hook issues on Windows (related to .venv/bin/activate path), local hooks were bypassed after manual formatting with ruff.

Note

Medium Risk
Moderate risk due to new async execution paths (HITL feedback loops, async HTTP client) and adding Playwright-based rendering, which can introduce event-loop, dependency, and runtime environment issues.

Overview
Adds optional JavaScript rendering to ScrapeWebsiteTool via a new render_js flag that switches scraping from requests to Playwright page rendering before parsing with BeautifulSoup; playwright is added as a dependency and new unit tests cover both JS-rendered and default paths.

Extends HITL handling to support async feedback loops: introduces handle_feedback_async in the human input provider contract with non-blocking stdin reads, updates both agent executors to await async feedback processing, and adds _ainvoke_loop to the experimental executor to support async re-execution during feedback.

Updates flow router listener dispatch to always execute listeners for router outcomes (passing through last_human_feedback when present) and adds regression tests for chained router outcomes. Separately, switches PlusAPI.get_agent to async httpx and adapts call sites/tests accordingly (including using asyncio.run).

^{Written by Cursor Bugbot for commit fd457d1. This will update automatically on new commits. Configure here.}

asynchronous human-in-the-loop handling and related fixes. - Extend human_input provider with async support: AsyncExecutorContext, handle_feedback_async, async prompt helpers (_prompt_input_async, _async_readline), and async training/regular feedback loops in SyncHumanInputProvider. - Add async handler methods in CrewAgentExecutor and AgentExecutor (_ahandle_human_feedback, _ainvoke_loop) to integrate async provider flows. - Change PlusAPI.get_agent to an async httpx call and adapt caller in agent_utils to run it via asyncio.run. - Simplify listener execution in flow.Flow to correctly pass HumanFeedbackResult to listeners and unify execution path for router outcomes. - Remove deprecated types/hitl.py definitions. - Add tests covering chained router feedback, rejected paths, and mixed router/non-router listeners to prevent regressions.

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2026-02-07T08:11:54Z

lib/crewai/src/crewai/utilities/agent_utils.py

            client = PlusAPI(api_key=get_auth_token())
        _print_current_organization()
-        response = client.get_agent(from_repository)
+        response = asyncio.run(client.get_agent(from_repository))


asyncio.run() fails inside already-running event loops

High Severity

asyncio.run(client.get_agent(from_repository)) raises RuntimeError when called from within an already-running event loop. Since load_agent_from_repository is invoked from a sync Pydantic model validator, constructing an Agent(from_repository=...) during kickoff_async or any other async context will crash. The previous sync requests-based get_agent worked in all contexts. Additionally, existing tests that @patch get_agent with a regular MagicMock will also break because asyncio.run() expects a coroutine, not a MagicMock.

Additional Locations (1)

lib/crewai/src/crewai/cli/plus_api.py#L57-L61

cursor · 2026-02-07T08:11:54Z

lib/crewai/src/crewai/core/providers/human_input.py

+        raw = await reader.readline()
+        return raw.decode().rstrip("\n")
+    except (OSError, NotImplementedError, ValueError):
+        return await asyncio.to_thread(input)


Repeated connect_read_pipe on stdin corrupts state

Medium Severity

_async_readline calls connect_read_pipe on sys.stdin every invocation, but the transport is never closed. The first call puts stdin's fd in non-blocking mode and registers a persistent transport. On subsequent calls (expected during the multi-round feedback loop in _handle_regular_feedback_async), connect_read_pipe fails, the fallback calls asyncio.to_thread(input), but input() may malfunction because stdin is still in non-blocking mode from the first call, and the lingering transport can race for bytes.

cursor · 2026-02-07T08:11:54Z

lib/crewai-tools/pyproject.toml

    "python-docx~=1.2.0",
    "youtube-transcript-api~=1.2.2",
    "pymupdf~=1.26.6",
+    "playwright>=1.57.0",


Playwright added as hard dependency instead of optional

Medium Severity

playwright is added to the required dependencies list, making it mandatory for all crewai-tools users. Playwright is a heavyweight package that also requires separate browser binary installation (playwright install). The implementation already handles ImportError at runtime (lines 87–88 of the scrape tool), clearly indicating it was designed to be optional. It belongs in [project.optional-dependencies] alongside similar optional packages like selenium and browserbase.

cursor · 2026-02-07T08:11:54Z

lib/crewai/src/crewai/cli/plus_api.py

+    async def get_agent(self, handle: str) -> httpx.Response:
+        url = urljoin(self.base_url, f"{self.AGENTS_RESOURCE}/{handle}")
+        async with httpx.AsyncClient() as client:
+            return await client.get(url, headers=self.headers)


Async get_agent drops trust_env=False proxy setting

Low Severity

The sync _make_request explicitly sets session.trust_env = False to ignore proxy environment variables, but the new async get_agent uses httpx.AsyncClient() which defaults to trust_env=True. This means the async version will pick up HTTP_PROXY/HTTPS_PROXY environment variables that the sync version intentionally ignores, potentially causing requests to route through unintended proxies or fail in corporate/CI environments.

cursor bot reviewed Feb 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scrape_tool): add support for JavaScript rendering using Playwright#4402

feat(scrape_tool): add support for JavaScript rendering using Playwright#4402
mrsadeghi wants to merge 1 commit intocrewAIInc:mainfrom
mrsadeghi:feat/add-playwright-rendering

mrsadeghi commented Feb 7, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 7, 2026

Uh oh!

cursor bot Feb 7, 2026

Uh oh!

cursor bot Feb 7, 2026

Uh oh!

cursor bot Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mrsadeghi commented Feb 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Why this is needed?

Testing

Note for Maintainers

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor bot Feb 7, 2026

Choose a reason for hiding this comment

asyncio.run() fails inside already-running event loops

Uh oh!

cursor bot Feb 7, 2026

Choose a reason for hiding this comment

Repeated connect_read_pipe on stdin corrupts state

Uh oh!

cursor bot Feb 7, 2026

Choose a reason for hiding this comment

Playwright added as hard dependency instead of optional

Uh oh!

cursor bot Feb 7, 2026

Choose a reason for hiding this comment

Async get_agent drops trust_env=False proxy setting

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mrsadeghi commented Feb 7, 2026 •

edited by cursor bot

Loading

`asyncio.run()` fails inside already-running event loops

Repeated `connect_read_pipe` on stdin corrupts state

Async `get_agent` drops `trust_env=False` proxy setting