Skip to content

Conversation

@xingyaoww
Copy link
Contributor

Summary

This PR adds comprehensive documentation for the new experimental Critic feature in the OpenHands SDK.

What's Added

A new guide at sdk/guides/critic.mdx that covers:

Core Concepts

  • What is a Critic? - Explanation of the LLM-based evaluation system
  • When to Use Critics - Use cases for quality monitoring, early intervention, and performance analysis
  • Evaluation Modes - Two modes: finish_and_message (default) and all_actions

Implementation Guide

  • Setting Up APIBasedCritic - Complete example with configuration
  • Configuration Options - All parameters explained (server_url, api_key, model_name, mode)
  • Understanding Results - How to interpret CriticResult scores and feedback
  • Visualizing Results - Color-coded output in the conversation visualizer
  • Programmatic Access - How to access critic results in callbacks

Technical Details

  • How It Works - Step-by-step evaluation flow
  • Chat Template Format - Qwen3-4B-Instruct-2507 template explanation
  • Security - API key handling with SecretStr
  • Performance Considerations - Latency, cost, and parallelization details

Advanced Usage

  • Custom Critic Implementations - Extending CriticBase with custom logic
  • Built-in Critics - PassCritic, AgentFinishedCritic, EmptyPatchCritic
  • Troubleshooting - Common issues and solutions

Example Code

Includes the full example from examples/01_standalone_sdk/34_critic_model_example.py with:

  • Auto-configuration for All-Hands LLM proxy
  • Manual configuration fallback
  • Running instructions

⚠️ Experimental Status

The guide includes prominent warnings that this feature is:

  • Highly experimental and subject to change
  • Not recommended for production without thorough testing
  • Subject to API and behavior changes based on feedback

Related PR

This documentation corresponds to OpenHands/software-agent-sdk#1269 which implements the Critic feature.

Preview

The guide follows the same structure and style as existing SDK guides, including:

  • Clear warnings about experimental status
  • Code examples with syntax highlighting
  • Step-by-step instructions
  • Troubleshooting section
  • Links to related guides

Checklist

  • Added comprehensive guide for Critic feature
  • Included clear experimental warnings
  • Provided complete code examples
  • Added troubleshooting section
  • Documented all configuration options
  • Linked to example code in repository
  • Followed existing documentation style and format

This guide documents the experimental API-based Critic feature for
real-time evaluation of agent actions and messages using an external LLM.

Key topics covered:
- Overview of what critics are and when to use them
- Two evaluation modes: finish_and_message and all_actions
- Configuration and setup with APIBasedCritic
- Understanding and visualizing critic results
- Technical details including chat template format
- Custom critic implementations
- Built-in critic types
- Troubleshooting common issues

The guide includes clear warnings that this is an experimental feature
subject to change and not recommended for production use without
thorough testing.

Co-authored-by: openhands <openhands@all-hands.dev>
- Remove 'When to Use Critics' section
- Remove implementation details (evaluation modes, how it works, technical details)
- Remove custom critic implementations and other critic types
- Simplify troubleshooting section
- Add note that critic is hosted by OpenHands LLM Provider and is free
- Rename 'Setting Up' to 'Quick Start' with emphasis on auto-configuration
- Explain what a critic is and what it provides
- Add example use case: triggering agent to reflect/fix when critic indicates poor performance
@xingyaoww
Copy link
Contributor Author

@OpenHands can we link the this docs page to docs.json

@openhands-ai
Copy link

openhands-ai bot commented Jan 20, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

xingyaoww and others added 2 commits January 20, 2026 13:21
Link the sdk/guides/critic.mdx page to the Agent Features section in docs.json navigation.
@openhands-ai
Copy link

openhands-ai bot commented Jan 20, 2026

I've linked the critic docs page to docs.json by adding "sdk/guides/critic" to the "Agent Features" group in the SDK navigation.

Changes made:

  • Added "sdk/guides/critic" to the "Agent Features" section in docs.json (line 271)

The change has been committed and pushed to the xw/critic-model branch. The critic guide will now appear in the SDK documentation navigation under SDK → Guides → Agent Features → Critic.

View full conversation

@xingyaoww
Copy link
Contributor Author

@OpenHands can we create a new page for https://docs.openhands.dev/openhands/usage/cli/critic and put it under CLI's extension group in docs.json.

It should link to the current SDK page, it should explain that we have experiment critic feature enabled for people who are already using OpenHands LLM Provider (https://docs.openhands.dev/openhands/usage/llms/openhands-llms)

It is currently free during public beta phase

Experimental Critic Feature Enabled
We've detected that you're using the OpenHands LLM provider. An experimental critic feature is now active (free) to predict task success. We may collect your feedback rating. You can disable this in settings.

@openhands-ai
Copy link

openhands-ai bot commented Jan 20, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

- Create new page at openhands/usage/cli/critic.mdx explaining the experimental critic feature
- Add page to CLI Extensions group in docs.json
- Link to SDK critic guide for detailed information
- Note that feature is free during public beta phase

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 20, 2026

I've created the new CLI critic page and added it to the documentation navigation. Here's a summary of the changes:

Changes Made

1. Created new page: openhands/usage/cli/critic.mdx

The page includes:

  • ✅ Warning about experimental status
  • ✅ Link to the OpenHands LLM Provider page (/openhands/usage/llms/openhands-llms)
  • ✅ The exact message provided about the experimental critic feature being enabled
  • ✅ Note that it's free during public beta phase
  • ✅ Brief explanation of what the critic does
  • ✅ Link to the SDK Critic Guide (/sdk/guides/critic) for detailed information

2. Updated docs.json

  • ✅ Added "openhands/usage/cli/critic" to the CLI "Extensions" group

The page will be accessible at https://docs.openhands.dev/openhands/usage/cli/critic and will appear in the CLI documentation under the Extensions section alongside MCP Servers.

The changes have been committed and pushed to the xw/critic-model branch.

View full conversation

- Add critic-cli-output.png showing critic score in CLI
- Add critic-cli-settings.png showing CLI settings panel
- Add critic-sdk-visualizer.png showing SDK visualizer output
- Update CLI critic docs to embed screenshots
- Update SDK critic guide to embed visualizer screenshot

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww xingyaoww marked this pull request as ready for review January 20, 2026 19:02
@xingyaoww xingyaoww requested a review from mamoodi as a code owner January 20, 2026 19:02
@xingyaoww xingyaoww requested a review from jpelletier1 January 20, 2026 19:02
Move screenshots to local folders next to their respective mdx files
and use relative paths (./screenshots/) instead of absolute paths,
following the pattern used in local-llms.mdx.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Contributor

@jpelletier1 jpelletier1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

</Warning>

<Note>
The critic model is hosted by the OpenHands LLM Provider and is currently free to use. This example is available on GitHub: [examples/01_standalone_sdk/34_critic_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/34_critic_example.py)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands run sync lock workflow and update PR

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! enyst can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final summary (since last update)

What I checked

  • Request addressed: ran the “sync/lock” equivalent locally (code-block sync + API doc generation) and updated the existing PR branch.
  • Instructions followed: worked on branch xw/critic-model, committed changes, pushed to the same branch, and did not comment on the PR.
  • Branch is up to date / pushed: HEAD equals origin/xw/critic-model (no unpushed commits).
  • Conciseness review: changes are the direct output of running the repo’s sync steps; no unrelated manual edits.

Changes made

  1. Synced MDX code blocks from OpenHands/software-agent-sdk

    • Updated:
      • sdk/guides/critic.mdx
      • sdk/guides/hooks.mdx
      • sdk/guides/plugins.mdx
      • sdk/guides/skill.mdx
  2. Regenerated SDK API reference pages

    • Updated:
      • sdk/api-reference/openhands.sdk.agent.mdx
      • sdk/api-reference/openhands.sdk.conversation.mdx
      • sdk/api-reference/openhands.sdk.event.mdx
      • sdk/api-reference/openhands.sdk.llm.mdx
      • sdk/api-reference/openhands.sdk.tool.mdx
      • sdk/api-reference/openhands.sdk.workspace.mdx

Git details

  • Commit on PR branch: f9575fa docs: sync code blocks and regenerate API reference
  • Push status: already pushed to origin/xw/critic-model (no further action needed).

View full conversation

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 21, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • .github/workflows/sync-docs-code-blocks.yml
    • .github/workflows/sync-agent-sdk-openapi.yml

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #263 at branch `xw/critic-model`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@xingyaoww xingyaoww merged commit f5516da into main Jan 21, 2026
1 of 2 checks passed
@xingyaoww xingyaoww deleted the xw/critic-model branch January 21, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants