-
Notifications
You must be signed in to change notification settings - Fork 8
Add SDK Guide for Critic Feature (Experimental) #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
44db99f
Add SDK guide for Critic feature (experimental)
openhands-agent e190585
Simplify critic documentation: focus on user usage
openhands-agent 97437c7
Add 'What is a Critic?' section explaining use cases
openhands-agent 79110c4
Add reference to SWE-Bench blog post and mention forthcoming technica…
openhands-agent 5372457
Change 'evaluation model' to 'evaluator' in critic description
openhands-agent 9e77c4c
Update critic score visualization example to match actual output
openhands-agent 26a9826
Apply suggestion from @xingyaoww
xingyaoww 450766e
Rename example file from 34_critic_model_example.py to 34_critic_exam…
openhands-agent 86ea0cb
Merge branch 'main' into xw/critic-model
xingyaoww 9cef6bc
Add critic guide to docs.json navigation
openhands-agent c7b74a6
Add CLI critic page for OpenHands LLM Provider users
openhands-agent a3d7ad4
Add screenshots for critic feature documentation
openhands-agent 0e67fdf
Apply suggestion from @xingyaoww
xingyaoww a60c2b8
Apply suggestion from @xingyaoww
xingyaoww fe76ba3
Apply suggestion from @xingyaoww
xingyaoww f273030
Fix screenshot paths using relative paths
openhands-agent f9575fa
docs: sync code blocks and regenerate API reference
enyst 38300ec
Add critic demo video to CLI documentation
openhands-agent File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| --- | ||
| title: Critic (Experimental) | ||
| description: Automatic task success prediction for OpenHands LLM Provider users | ||
| --- | ||
|
|
||
| <Warning> | ||
| **This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. | ||
| </Warning> | ||
|
|
||
| ## Overview | ||
|
|
||
| If you're using the [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms), an experimental **critic feature** is automatically enabled to predict task success in real-time. | ||
|
|
||
| For detailed information about the critic feature, including programmatic access and advanced usage, see the [SDK Critic Guide](/sdk/guides/critic). | ||
|
|
||
|
|
||
| ## What is the Critic? | ||
|
|
||
| The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. It provides: | ||
|
|
||
| - **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success | ||
| - **Real-time feedback**: Scores computed during agent execution, not just at completion | ||
|
|
||
| <video | ||
| controls | ||
| className="w-full aspect-video" | ||
| src="/openhands/usage/cli/critic-demo.mp4" | ||
| ></video> | ||
|
|
||
|  | ||
|
|
||
| ## Pricing | ||
|
|
||
| The critic feature is **free during the public beta phase** for all OpenHands LLM Provider users. | ||
|
|
||
| ## Disabling the Critic | ||
|
|
||
| If you prefer not to use the critic feature, you can disable it in your settings. | ||
|
|
||
|  | ||
|
|
||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.