feat: update tool parameter check grader to improve accuracy#127
feat: update tool parameter check grader to improve accuracy#127jc200808 wants to merge 4 commits intoagentscope-ai:mainfrom
Conversation
Summary of ChangesHello @jc200808, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Highlights
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request updates the ToolParameterCheckGrader to refine the evaluation logic for tool parameters, focusing on grounding them in tool definitions and handling optional parameters more explicitly. However, there is a significant discrepancy as the English prompt was updated with the new logic while the Chinese prompt remains unchanged, which will lead to inconsistent grading behavior. Additionally, the phrasing 'matching tool of tool definitions' is somewhat awkward and has been simplified in the suggestions to 'matching tool definition' for better clarity.
| """You are an expert in analyzing tool calls. Your task is to evaluate whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions. This includes checking if parameters are accurate and complete. | ||
|
|
||
| <Rubrics> | ||
| 1. All required parameters are present and extracted from the query | ||
| 2. Parameter values match exactly what was specified in the query | ||
| 3. All parameters are grounded in the query (no fabricated values) | ||
| 4. Parameter data types and formats are correct | ||
| 5. Optional parameters are used appropriately when specified in the query | ||
| 6. All parameters mentioned in the query are captured | ||
| 7. Parameters are correctly mapped without confusion | ||
| 1. All required parameters are present and grounded in the matching tool of tool definitions | ||
| 2. All required parameter values are extracted from the query | ||
| 3. All parameter data types and formats are grounded in the matching tool of tool definitions | ||
| 4. Optional parameter values are used appropriately when present in the query | ||
| 5. It is a completion extraction if a optional parameter value is not present in query and use null/none or equivalent value as a placeholder | ||
| 6. Avoid checking tool selection accuracy | ||
| </Rubrics> | ||
|
|
||
| <Steps> | ||
| 1. Verify parameter completeness: Check if all parameters mentioned in the query are extracted | ||
| 2. Verify parameter accuracy: Ensure parameter values match the query exactly | ||
| 3. Detect hallucinations: Identify any parameters not present in the query | ||
| 4. Check data types: Verify parameters use correct data types and formats | ||
| 5. Assess overall correctness: Determine if the tool call is executable with correct parameters | ||
| 1. Verify parameter completeness: Check if all required parameter values present in the query are extracted | ||
| 2. Verify parameter accuracy: Ensure parameter values match the query exactly if present in the query | ||
| 3. Check data types: Ensure the data types and formats of all parameter values are grounded in the matching tool of tool definitions |
There was a problem hiding this comment.
The English prompt has been updated with new evaluation logic, but the Chinese prompt (TOOL_PARAMETER_CHECK_PROMPT_ZH) starting at line 71 has not been updated to match. This inconsistency will cause the grader to behave differently depending on the language setting. Please synchronize the Chinese prompt. Additionally, the phrasing 'matching tool of tool definitions' has been simplified to 'matching tool definition', and minor grammatical errors in rubric 5 have been corrected.
| """You are an expert in analyzing tool calls. Your task is to evaluate whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions. This includes checking if parameters are accurate and complete. | |
| <Rubrics> | |
| 1. All required parameters are present and extracted from the query | |
| 2. Parameter values match exactly what was specified in the query | |
| 3. All parameters are grounded in the query (no fabricated values) | |
| 4. Parameter data types and formats are correct | |
| 5. Optional parameters are used appropriately when specified in the query | |
| 6. All parameters mentioned in the query are captured | |
| 7. Parameters are correctly mapped without confusion | |
| 1. All required parameters are present and grounded in the matching tool of tool definitions | |
| 2. All required parameter values are extracted from the query | |
| 3. All parameter data types and formats are grounded in the matching tool of tool definitions | |
| 4. Optional parameter values are used appropriately when present in the query | |
| 5. It is a completion extraction if a optional parameter value is not present in query and use null/none or equivalent value as a placeholder | |
| 6. Avoid checking tool selection accuracy | |
| </Rubrics> | |
| <Steps> | |
| 1. Verify parameter completeness: Check if all parameters mentioned in the query are extracted | |
| 2. Verify parameter accuracy: Ensure parameter values match the query exactly | |
| 3. Detect hallucinations: Identify any parameters not present in the query | |
| 4. Check data types: Verify parameters use correct data types and formats | |
| 5. Assess overall correctness: Determine if the tool call is executable with correct parameters | |
| 1. Verify parameter completeness: Check if all required parameter values present in the query are extracted | |
| 2. Verify parameter accuracy: Ensure parameter values match the query exactly if present in the query | |
| 3. Check data types: Ensure the data types and formats of all parameter values are grounded in the matching tool of tool definitions | |
| """You are an expert in analyzing tool calls. Your task is to evaluate whether the generated tool call extracts completely correct parameters from the user query and the matching tool definition. This includes checking if parameters are accurate and complete.\n\n<Rubrics>\n1. All required parameters are present and grounded in the matching tool definition\n2. All required parameter values are extracted from the query\n3. All parameter data types and formats are grounded in the matching tool definition\n4. Optional parameter values are used appropriately when present in the query\n5. Consider it a complete extraction if an optional parameter value is not present in the query and uses null/none or an equivalent value as a placeholder\n6. Avoid checking tool selection accuracy\n</Rubrics>\n\n<Steps>\n1. Verify parameter completeness: Check if all required parameter values present in the query are extracted\n2. Verify parameter accuracy: Ensure parameter values match the query exactly if present in the query\n3. Check data types: Ensure the data types and formats of all parameter values are grounded in the matching tool definition |
| Tool Parameter Check Grader | ||
|
|
||
| Evaluates whether the generated tool call extracts completely correct parameters from the user query. | ||
| Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions |
There was a problem hiding this comment.
The phrase 'matching tool of tool definitions' is a bit awkward. 'matching tool definition' is more concise and natural.
| Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions | |
| Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool definition |
|
|
||
| Evaluates whether the generated tool call extracts completely correct parameters | ||
| from the user query. | ||
| from the user query and the matching tool of tool definitions. |
OpenJudge Version
[The version of OpenJudge you are working on, e.g.
import openjudge; print(openjudge.__version__)]Description
[Please describe the background, purpose, changes made, and how to test this PR]
Checklist
Please check the following items before code is ready to be reviewed.
pre-commit run --all-filescommand