Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
44db99f
Add SDK guide for Critic feature (experimental)
openhands-agent Jan 15, 2026
e190585
Simplify critic documentation: focus on user usage
openhands-agent Jan 20, 2026
97437c7
Add 'What is a Critic?' section explaining use cases
openhands-agent Jan 20, 2026
79110c4
Add reference to SWE-Bench blog post and mention forthcoming technica…
openhands-agent Jan 20, 2026
5372457
Change 'evaluation model' to 'evaluator' in critic description
openhands-agent Jan 20, 2026
9e77c4c
Update critic score visualization example to match actual output
openhands-agent Jan 20, 2026
26a9826
Apply suggestion from @xingyaoww
xingyaoww Jan 20, 2026
450766e
Rename example file from 34_critic_model_example.py to 34_critic_exam…
openhands-agent Jan 20, 2026
86ea0cb
Merge branch 'main' into xw/critic-model
xingyaoww Jan 20, 2026
9cef6bc
Add critic guide to docs.json navigation
openhands-agent Jan 20, 2026
c7b74a6
Add CLI critic page for OpenHands LLM Provider users
openhands-agent Jan 20, 2026
a3d7ad4
Add screenshots for critic feature documentation
openhands-agent Jan 20, 2026
0e67fdf
Apply suggestion from @xingyaoww
xingyaoww Jan 20, 2026
a60c2b8
Apply suggestion from @xingyaoww
xingyaoww Jan 20, 2026
fe76ba3
Apply suggestion from @xingyaoww
xingyaoww Jan 20, 2026
f273030
Fix screenshot paths using relative paths
openhands-agent Jan 20, 2026
f9575fa
docs: sync code blocks and regenerate API reference
enyst Jan 21, 2026
38300ec
Add critic demo video to CLI documentation
openhands-agent Jan 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,8 @@
{
"group": "Extensions",
"pages": [
"openhands/usage/cli/mcp-servers"
"openhands/usage/cli/mcp-servers",
"openhands/usage/cli/critic"
]
},
{
Expand Down Expand Up @@ -268,7 +269,8 @@
"sdk/guides/agent-custom",
"sdk/guides/convo-custom-visualizer",
"sdk/guides/agent-stuck-detector",
"sdk/guides/agent-tom-agent"
"sdk/guides/agent-tom-agent",
"sdk/guides/critic"
]
},
{
Expand Down
Binary file added openhands/usage/cli/critic-demo.mp4
Binary file not shown.
41 changes: 41 additions & 0 deletions openhands/usage/cli/critic.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: Critic (Experimental)
description: Automatic task success prediction for OpenHands LLM Provider users
---

<Warning>
**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing.
</Warning>

## Overview

If you're using the [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms), an experimental **critic feature** is automatically enabled to predict task success in real-time.

For detailed information about the critic feature, including programmatic access and advanced usage, see the [SDK Critic Guide](/sdk/guides/critic).


## What is the Critic?

The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. It provides:

- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success
- **Real-time feedback**: Scores computed during agent execution, not just at completion

<video
controls
className="w-full aspect-video"
src="/openhands/usage/cli/critic-demo.mp4"
></video>

![Critic output in CLI](./screenshots/critic-cli-output.png)

## Pricing

The critic feature is **free during the public beta phase** for all OpenHands LLM Provider users.

## Disabling the Critic

If you prefer not to use the critic feature, you can disable it in your settings.

![Critic settings in CLI](./screenshots/critic-cli-settings.png)

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 3 additions & 12 deletions sdk/api-reference/openhands.sdk.agent.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,8 @@ AgentBase and implements the agent execution logic.

#### Properties

- `agent_context`: AgentContext | None
- `condenser`: CondenserBase | None
- `filter_tools_regex`: str | None
- `include_default_tools`: list[str]
- `llm`: LLM
- `mcp_config`: dict[str, Any]
- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `security_policy_filename`: str
- `system_prompt_filename`: str
- `system_prompt_kwargs`: dict[str, object]
- `tools`: list[Tool]

#### Methods

Expand Down Expand Up @@ -94,11 +84,12 @@ agent implementations must follow.

- `agent_context`: AgentContext | None
- `condenser`: CondenserBase | None
- `critic`: CriticBase | None
- `filter_tools_regex`: str | None
- `include_default_tools`: list[str]
- `llm`: LLM
- `mcp_config`: dict[str, Any]
- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `name`: str
Returns the name of the Agent.
Expand Down
36 changes: 31 additions & 5 deletions sdk/api-reference/openhands.sdk.conversation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,10 @@ Send a message to the agent.

Set the confirmation policy for the conversation.

#### abstractmethod set_security_analyzer()

Set the security analyzer for the conversation.

#### abstractmethod update_secrets()

### class Conversation
Expand Down Expand Up @@ -197,8 +201,6 @@ Bases: `OpenHandsModel`
- `execution_status`: [ConversationExecutionStatus](#class-conversationexecutionstatus)
- `id`: UUID
- `max_iterations`: int
- `model_config`: ClassVar[ConfigDict] = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `persistence_dir`: str | None
- `secret_registry`: [SecretRegistry](#class-secretregistry)
- `security_analyzer`: SecurityAnalyzerBase | None
Expand Down Expand Up @@ -280,6 +282,10 @@ actions that are pending confirmation or execution.

Return True if the lock is currently held by any thread.

#### model_config = (configuration object)

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

#### model_post_init()

This function is meant to behave like a BaseModel method to initialise private attributes.
Expand Down Expand Up @@ -352,7 +358,25 @@ Conversation will then calls MyVisualizer() followed by initialize(state)

Initialize the visualizer base.

#### initialize()
#### create_sub_visualizer()

Create a visualizer for a sub-agent during delegation.

Override this method to support sub-agent visualization in multi-agent
delegation scenarios. The sub-visualizer will be used to display events
from the spawned sub-agent.

By default, returns None which means sub-agents will not have visualization.
Subclasses that support delegation (like DelegationVisualizer) should
override this method to create appropriate sub-visualizers.

* Parameters:
`agent_id` – The identifier of the sub-agent being spawned
* Returns:
A visualizer instance for the sub-agent, or None if sub-agent
visualization is not supported

#### final initialize()

Initialize the visualizer with conversation state.

Expand Down Expand Up @@ -772,8 +796,6 @@ even when callable secrets fail on subsequent calls.

#### Properties

- `model_config`: ClassVar[ConfigDict] = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `secret_sources`: dict[str, SecretSource]

#### Methods
Expand Down Expand Up @@ -808,6 +830,10 @@ fresh values from callables to ensure comprehensive masking.
* Returns:
Text with secret values replaced by `<secret-hidden>`

#### model_config = (configuration object)

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

#### model_post_init()

This function is meant to behave like a BaseModel method to initialise private attributes.
Expand Down
35 changes: 16 additions & 19 deletions sdk/api-reference/openhands.sdk.event.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@ Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent)
#### Properties

- `action`: Action | None
- `critic_result`: CriticResult | None
- `llm_response_id`: str
- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `reasoning_content`: str | None
- `responses_reasoning_item`: ReasoningItemModel | None
Expand Down Expand Up @@ -47,7 +48,7 @@ represents an error produced by the agent/scaffold, not model output.
#### Properties

- `error`: str
- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `source`: Literal['agent', 'user', 'environment']
- `visualize`: Text
Expand All @@ -68,7 +69,7 @@ This action indicates a condensation of the conversation history is happening.

- `forgotten_event_ids`: list[str]
- `llm_response_id`: str
- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `source`: Literal['agent', 'user', 'environment']
- `summary`: str | None
Expand All @@ -86,7 +87,7 @@ This action is used to request a condensation of the conversation history.

#### Properties

- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `source`: Literal['agent', 'user', 'environment']
- `visualize`: Text
Expand All @@ -112,7 +113,7 @@ This event represents a summary generated by a condenser.

#### Properties

- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `source`: Literal['agent', 'user', 'environment']
- `summary`: str
Expand All @@ -138,7 +139,7 @@ to ensure compatibility with websocket transmission.
#### Properties

- `key`: str
- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `source`: Literal['agent', 'user', 'environment']
- `value`: Any
Expand Down Expand Up @@ -194,7 +195,7 @@ instead of writing it to a file inside the Docker container.

- `filename`: str
- `log_data`: str
- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `model_name`: str
- `source`: Literal['agent', 'user', 'environment']
Expand All @@ -208,11 +209,8 @@ Base class for events that can be converted to LLM messages.

#### Properties

- `id`: EventID
- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `source`: SourceType
- `timestamp`: str

#### Methods

Expand All @@ -234,8 +232,8 @@ This is originally the “MessageAction”, but it suppose not to be tool call.
#### Properties

- `activated_skills`: list[str]
- `critic_result`: CriticResult | None
- `extended_content`: list[TextContent]
- `id`: EventID
- `llm_message`: Message
- `llm_response_id`: str | None
- `model_config`: ClassVar[ConfigDict] = (configuration object)
Expand All @@ -245,7 +243,6 @@ This is originally the “MessageAction”, but it suppose not to be tool call.
- `source`: Literal['agent', 'user', 'environment']
- `thinking_blocks`: Sequence[ThinkingBlock | RedactedThinkingBlock]
Return the Anthropic thinking blocks from the LLM message.
- `timestamp`: str
- `visualize`: Text
Return Rich Text representation of this message event.

Expand All @@ -264,7 +261,7 @@ Examples include tool execution, error, user reject.

#### Properties

- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `source`: Literal['agent', 'user', 'environment']
- `tool_call_id`: str
Expand All @@ -277,7 +274,7 @@ Bases: [`ObservationBaseEvent`](#class-observationbaseevent)
#### Properties

- `action_id`: str
- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `observation`: Observation
- `visualize`: Text
Expand All @@ -296,7 +293,7 @@ Event indicating that the agent execution was paused by user request.

#### Properties

- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `source`: Literal['agent', 'user', 'environment']
- `visualize`: Text
Expand All @@ -310,7 +307,7 @@ System prompt added by the agent.

#### Properties

- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `source`: Literal['agent', 'user', 'environment']
- `system_prompt`: TextContent
Expand All @@ -331,7 +328,7 @@ Event from VLLM representing token IDs used in LLM interaction.

#### Properties

- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `prompt_token_ids`: list[int]
- `response_token_ids`: list[int]
Expand All @@ -346,7 +343,7 @@ Observation when user rejects an action in confirmation mode.
#### Properties

- `action_id`: str
- `model_config`: ClassVar[ConfigDict] = (configuration object)
- `model_config`: = (configuration object)
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- `rejection_reason`: str
- `visualize`: Text
Expand Down
Loading