Skip to content

tool_use.caller field causes an error in Anthropic Computer Use Agent. #249

@asim-hl

Description

@asim-hl

Bug: tool_use.caller field causes "Extra inputs are not permitted" error on Step 2+

Description

When using AnthropicCUAClient with Claude models (e.g., claude-sonnet-4-20250514), the agent fails on Step 2 of execution with:

Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.1.content.1.tool_use.caller: Extra inputs are not permitted'}}

Root Cause

In stagehand/agent/anthropic_cua.py, the _process_provider_response method serializes response content blocks using block.model_dump() (line ~299):

raw_assistant_content_blocks = [
    block.model_dump() for block in response.content
]

The Anthropic SDK's beta API response includes a caller field on tool_use blocks (see anthropic/types/beta/beta_tool_use_block_param.py). This field is valid in API responses but is not accepted when sending tool_use blocks back to the API in subsequent requests as part of the conversation history.

When raw_assistant_content_blocks is appended to current_messages (line ~211-212) and sent in the next API call, the API rejects the caller field.

Steps to Reproduce

  1. Create an agent using AnthropicCUAClient with any Claude model
  2. Execute a multi-step task that requires more than one API call
  3. Observe the error on Step 2

Expected Behavior

The agent should successfully execute multi-step tasks without API validation errors.

Actual Behavior

The agent fails on Step 2 with invalid_request_error because the caller field is included in the conversation history.

Suggested Fix

Exclude the caller field when serializing response content blocks:

# In _process_provider_response method
raw_assistant_content_blocks = []
if hasattr(response, "content") and isinstance(response.content, list):
    try:
        for block in response.content:
            block_dict = block.model_dump()
            # Remove 'caller' field - it's valid in responses but not accepted in requests
            block_dict.pop("caller", None)
            raw_assistant_content_blocks.append(block_dict)
    except Exception as e:
        # ... existing error handling

Environment

  • stagehand version: 0.5.7
  • anthropic SDK version: 0.75.0
  • Python version: 3.13
  • Model: claude-sonnet-4-20250514 (also affects other Claude models)

Workaround

Until this is fixed, users can monkey-patch AnthropicCUAClient._process_provider_response to strip the caller field before the stagehand module is used:

from stagehand.agent.anthropic_cua import AnthropicCUAClient
from stagehand.handlers.cua_handler import StagehandFunctionName

def _patched_process_provider_response(self, response):
    self.last_tool_use_ids = []
    model_message_parts = []
    agent_action = None

    raw_assistant_content_blocks = []
    if hasattr(response, "content") and isinstance(response.content, list):
        try:
            for block in response.content:
                block_dict = block.model_dump()
                block_dict.pop("caller", None)  # Remove problematic field
                raw_assistant_content_blocks.append(block_dict)
        except Exception as e:
            self.logger.error(
                f"Could not model_dump response.content blocks: {e}",
                category=StagehandFunctionName.AGENT,
            )
            raw_assistant_content_blocks = response.content

        tool_use_block = None
        for block in response.content:
            if block.type == "tool_use":
                tool_use_block = block
                self.last_tool_use_ids.append(block.id)
            elif block.type == "text":
                model_message_parts.append(block.text)

        if tool_use_block:
            tool_name = tool_use_block.name
            tool_input = tool_use_block.input if hasattr(tool_use_block, "input") else {}
            agent_action = self._convert_tool_use_to_agent_action(tool_name, tool_input)
            if agent_action:
                agent_action.step = raw_assistant_content_blocks

    model_message_text = " ".join(model_message_parts).strip() or None
    task_completed = not bool(agent_action)
    return (agent_action, model_message_text, task_completed, raw_assistant_content_blocks)

AnthropicCUAClient._process_provider_response = _patched_process_provider_response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions