Skip to content

Comments

feat: update tool parameter check grader to improve accuracy#127

Open
jc200808 wants to merge 4 commits intoagentscope-ai:mainfrom
jc200808:feat-dev-260223
Open

feat: update tool parameter check grader to improve accuracy#127
jc200808 wants to merge 4 commits intoagentscope-ai:mainfrom
jc200808:feat-dev-260223

Conversation

@jc200808
Copy link
Collaborator

OpenJudge Version

[The version of OpenJudge you are working on, e.g. import openjudge; print(openjudge.__version__)]

Description

[Please describe the background, purpose, changes made, and how to test this PR]

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has been formatted with pre-commit run --all-files command
  • All tests are passing
  • Docstrings are in Google style
  • Related documentation has been updated (e.g. links, examples, etc.)
  • Code is ready for review

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jc200808, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the ToolParameterCheckGrader by integrating tool definitions into its parameter validation process. This change aims to provide a more robust and accurate evaluation of generated tool calls, ensuring that parameters are not only correctly extracted from user queries but also align with the defined structure and types of the tools themselves. The update refines the grading criteria to better reflect the completeness and accuracy of parameter extraction in the context of available tool specifications.

Highlights

  • Grader Logic Refinement: The ToolParameterCheckGrader has been updated to improve the accuracy of tool parameter evaluation by explicitly incorporating the 'matching tool of tool definitions' into its assessment criteria.
  • Prompt Update: The English prompt (TOOL_PARAMETER_CHECK_PROMPT_EN) for the grader was significantly revised. The new rubrics and steps now emphasize grounding parameters in tool definitions, verifying required parameter extraction from the query, and ensuring data types align with definitions, while explicitly avoiding checks on tool selection accuracy.
  • Documentation Clarification: The docstrings and comments within the tool_parameter_check.py file were updated to reflect the enhanced evaluation scope, clearly stating that the grader now considers tool definitions in addition to user queries.
Activity
  • No review comments or activity have been recorded yet for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request updates the ToolParameterCheckGrader to refine the evaluation logic for tool parameters, focusing on grounding them in tool definitions and handling optional parameters more explicitly. However, there is a significant discrepancy as the English prompt was updated with the new logic while the Chinese prompt remains unchanged, which will lead to inconsistent grading behavior. Additionally, the phrasing 'matching tool of tool definitions' is somewhat awkward and has been simplified in the suggestions to 'matching tool definition' for better clarity.

Comment on lines 25 to 39
"""You are an expert in analyzing tool calls. Your task is to evaluate whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions. This includes checking if parameters are accurate and complete.

<Rubrics>
1. All required parameters are present and extracted from the query
2. Parameter values match exactly what was specified in the query
3. All parameters are grounded in the query (no fabricated values)
4. Parameter data types and formats are correct
5. Optional parameters are used appropriately when specified in the query
6. All parameters mentioned in the query are captured
7. Parameters are correctly mapped without confusion
1. All required parameters are present and grounded in the matching tool of tool definitions
2. All required parameter values are extracted from the query
3. All parameter data types and formats are grounded in the matching tool of tool definitions
4. Optional parameter values are used appropriately when present in the query
5. It is a completion extraction if a optional parameter value is not present in query and use null/none or equivalent value as a placeholder
6. Avoid checking tool selection accuracy
</Rubrics>

<Steps>
1. Verify parameter completeness: Check if all parameters mentioned in the query are extracted
2. Verify parameter accuracy: Ensure parameter values match the query exactly
3. Detect hallucinations: Identify any parameters not present in the query
4. Check data types: Verify parameters use correct data types and formats
5. Assess overall correctness: Determine if the tool call is executable with correct parameters
1. Verify parameter completeness: Check if all required parameter values present in the query are extracted
2. Verify parameter accuracy: Ensure parameter values match the query exactly if present in the query
3. Check data types: Ensure the data types and formats of all parameter values are grounded in the matching tool of tool definitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The English prompt has been updated with new evaluation logic, but the Chinese prompt (TOOL_PARAMETER_CHECK_PROMPT_ZH) starting at line 71 has not been updated to match. This inconsistency will cause the grader to behave differently depending on the language setting. Please synchronize the Chinese prompt. Additionally, the phrasing 'matching tool of tool definitions' has been simplified to 'matching tool definition', and minor grammatical errors in rubric 5 have been corrected.

Suggested change
"""You are an expert in analyzing tool calls. Your task is to evaluate whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions. This includes checking if parameters are accurate and complete.
<Rubrics>
1. All required parameters are present and extracted from the query
2. Parameter values match exactly what was specified in the query
3. All parameters are grounded in the query (no fabricated values)
4. Parameter data types and formats are correct
5. Optional parameters are used appropriately when specified in the query
6. All parameters mentioned in the query are captured
7. Parameters are correctly mapped without confusion
1. All required parameters are present and grounded in the matching tool of tool definitions
2. All required parameter values are extracted from the query
3. All parameter data types and formats are grounded in the matching tool of tool definitions
4. Optional parameter values are used appropriately when present in the query
5. It is a completion extraction if a optional parameter value is not present in query and use null/none or equivalent value as a placeholder
6. Avoid checking tool selection accuracy
</Rubrics>
<Steps>
1. Verify parameter completeness: Check if all parameters mentioned in the query are extracted
2. Verify parameter accuracy: Ensure parameter values match the query exactly
3. Detect hallucinations: Identify any parameters not present in the query
4. Check data types: Verify parameters use correct data types and formats
5. Assess overall correctness: Determine if the tool call is executable with correct parameters
1. Verify parameter completeness: Check if all required parameter values present in the query are extracted
2. Verify parameter accuracy: Ensure parameter values match the query exactly if present in the query
3. Check data types: Ensure the data types and formats of all parameter values are grounded in the matching tool of tool definitions
"""You are an expert in analyzing tool calls. Your task is to evaluate whether the generated tool call extracts completely correct parameters from the user query and the matching tool definition. This includes checking if parameters are accurate and complete.\n\n<Rubrics>\n1. All required parameters are present and grounded in the matching tool definition\n2. All required parameter values are extracted from the query\n3. All parameter data types and formats are grounded in the matching tool definition\n4. Optional parameter values are used appropriately when present in the query\n5. Consider it a complete extraction if an optional parameter value is not present in the query and uses null/none or an equivalent value as a placeholder\n6. Avoid checking tool selection accuracy\n</Rubrics>\n\n<Steps>\n1. Verify parameter completeness: Check if all required parameter values present in the query are extracted\n2. Verify parameter accuracy: Ensure parameter values match the query exactly if present in the query\n3. Check data types: Ensure the data types and formats of all parameter values are grounded in the matching tool definition

Tool Parameter Check Grader

Evaluates whether the generated tool call extracts completely correct parameters from the user query.
Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The phrase 'matching tool of tool definitions' is a bit awkward. 'matching tool definition' is more concise and natural.

Suggested change
Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions
Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool definition


Evaluates whether the generated tool call extracts completely correct parameters
from the user query.
from the user query and the matching tool of tool definitions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Simplifying the phrasing to match the updated prompt and module docstring.

Suggested change
from the user query and the matching tool of tool definitions.
from the user query and the matching tool definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant