feat: update tool parameter check grader to improve accuracy by jc200808 · Pull Request #127 · agentscope-ai/OpenJudge

jc200808 · 2026-02-23T21:59:43Z

OpenJudge Version

[The version of OpenJudge you are working on, e.g. import openjudge; print(openjudge.__version__)]

Description

[Please describe the background, purpose, changes made, and how to test this PR]

Checklist

Please check the following items before code is ready to be reviewed.

Code has been formatted with pre-commit run --all-files command
All tests are passing
Docstrings are in Google style
Related documentation has been updated (e.g. links, examples, etc.)
Code is ready for review

gemini-code-assist · 2026-02-23T21:59:55Z

Summary of Changes

Hello @jc200808, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the ToolParameterCheckGrader by integrating tool definitions into its parameter validation process. This change aims to provide a more robust and accurate evaluation of generated tool calls, ensuring that parameters are not only correctly extracted from user queries but also align with the defined structure and types of the tools themselves. The update refines the grading criteria to better reflect the completeness and accuracy of parameter extraction in the context of available tool specifications.

Highlights

Grader Logic Refinement: The ToolParameterCheckGrader has been updated to improve the accuracy of tool parameter evaluation by explicitly incorporating the 'matching tool of tool definitions' into its assessment criteria.
Prompt Update: The English prompt (TOOL_PARAMETER_CHECK_PROMPT_EN) for the grader was significantly revised. The new rubrics and steps now emphasize grounding parameters in tool definitions, verifying required parameter extraction from the query, and ensuring data types align with definitions, while explicitly avoiding checks on tool selection accuracy.
Documentation Clarification: The docstrings and comments within the tool_parameter_check.py file were updated to reflect the enhanced evaluation scope, clearly stating that the grader now considers tool definitions in addition to user queries.

Activity

No review comments or activity have been recorded yet for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request updates the ToolParameterCheckGrader to refine the evaluation logic for tool parameters, focusing on grounding them in tool definitions and handling optional parameters more explicitly. However, there is a significant discrepancy as the English prompt was updated with the new logic while the Chinese prompt remains unchanged, which will lead to inconsistent grading behavior. Additionally, the phrasing 'matching tool of tool definitions' is somewhat awkward and has been simplified in the suggestions to 'matching tool definition' for better clarity.

gemini-code-assist · 2026-02-23T22:01:29Z

openjudge/graders/agent/tool/tool_parameter_check.py

+    """You are an expert in analyzing tool calls. Your task is to evaluate whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions. This includes checking if parameters are accurate and complete.

 <Rubrics>
-1. All required parameters are present and extracted from the query
-2. Parameter values match exactly what was specified in the query
-3. All parameters are grounded in the query (no fabricated values)
-4. Parameter data types and formats are correct
-5. Optional parameters are used appropriately when specified in the query
-6. All parameters mentioned in the query are captured
-7. Parameters are correctly mapped without confusion
+1. All required parameters are present and grounded in the matching tool of tool definitions
+2. All required parameter values are extracted from the query
+3. All parameter data types and formats are grounded in the matching tool of tool definitions
+4. Optional parameter values are used appropriately when present in the query
+5. It is a completion extraction if a optional parameter value is not present in query and use null/none or equivalent value as a placeholder
+6. Avoid checking tool selection accuracy
 </Rubrics>

 <Steps>
-1. Verify parameter completeness: Check if all parameters mentioned in the query are extracted
-2. Verify parameter accuracy: Ensure parameter values match the query exactly
-3. Detect hallucinations: Identify any parameters not present in the query
-4. Check data types: Verify parameters use correct data types and formats
-5. Assess overall correctness: Determine if the tool call is executable with correct parameters
+1. Verify parameter completeness: Check if all required parameter values present in the query are extracted
+2. Verify parameter accuracy: Ensure parameter values match the query exactly if present in the query
+3. Check data types: Ensure the data types and formats of all parameter values are grounded in the matching tool of tool definitions


The English prompt has been updated with new evaluation logic, but the Chinese prompt (TOOL_PARAMETER_CHECK_PROMPT_ZH) starting at line 71 has not been updated to match. This inconsistency will cause the grader to behave differently depending on the language setting. Please synchronize the Chinese prompt. Additionally, the phrasing 'matching tool of tool definitions' has been simplified to 'matching tool definition', and minor grammatical errors in rubric 5 have been corrected.

Suggested change

"""You are an expert in analyzing tool calls. Your task is to evaluate whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions. This includes checking if parameters are accurate and complete.

<Rubrics>

1. All required parameters are present and extracted from the query

2. Parameter values match exactly what was specified in the query

3. All parameters are grounded in the query (no fabricated values)

4. Parameter data types and formats are correct

5. Optional parameters are used appropriately when specified in the query

6. All parameters mentioned in the query are captured

7. Parameters are correctly mapped without confusion

1. All required parameters are present and grounded in the matching tool of tool definitions

2. All required parameter values are extracted from the query

3. All parameter data types and formats are grounded in the matching tool of tool definitions

4. Optional parameter values are used appropriately when present in the query

5. It is a completion extraction if a optional parameter value is not present in query and use null/none or equivalent value as a placeholder

6. Avoid checking tool selection accuracy

</Rubrics>

<Steps>

1. Verify parameter completeness: Check if all parameters mentioned in the query are extracted

2. Verify parameter accuracy: Ensure parameter values match the query exactly

3. Detect hallucinations: Identify any parameters not present in the query

4. Check data types: Verify parameters use correct data types and formats

5. Assess overall correctness: Determine if the tool call is executable with correct parameters

1. Verify parameter completeness: Check if all required parameter values present in the query are extracted

2. Verify parameter accuracy: Ensure parameter values match the query exactly if present in the query

3. Check data types: Ensure the data types and formats of all parameter values are grounded in the matching tool of tool definitions

"""You are an expert in analyzing tool calls. Your task is to evaluate whether the generated tool call extracts completely correct parameters from the user query and the matching tool definition. This includes checking if parameters are accurate and complete.\n\n<Rubrics>\n1. All required parameters are present and grounded in the matching tool definition\n2. All required parameter values are extracted from the query\n3. All parameter data types and formats are grounded in the matching tool definition\n4. Optional parameter values are used appropriately when present in the query\n5. Consider it a complete extraction if an optional parameter value is not present in the query and uses null/none or an equivalent value as a placeholder\n6. Avoid checking tool selection accuracy\n</Rubrics>\n\n<Steps>\n1. Verify parameter completeness: Check if all required parameter values present in the query are extracted\n2. Verify parameter accuracy: Ensure parameter values match the query exactly if present in the query\n3. Check data types: Ensure the data types and formats of all parameter values are grounded in the matching tool definition

gemini-code-assist · 2026-02-23T22:01:29Z

openjudge/graders/agent/tool/tool_parameter_check.py

 Tool Parameter Check Grader

-Evaluates whether the generated tool call extracts completely correct parameters from the user query.
+Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions


The phrase 'matching tool of tool definitions' is a bit awkward. 'matching tool definition' is more concise and natural.

Suggested change

Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions

Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool definition

gemini-code-assist · 2026-02-23T22:01:29Z

openjudge/graders/agent/tool/tool_parameter_check.py


    Evaluates whether the generated tool call extracts completely correct parameters
-    from the user query.
+    from the user query and the matching tool of tool definitions.


Simplifying the phrasing to match the updated prompt and module docstring.

Suggested change

from the user query and the matching tool of tool definitions.

from the user query and the matching tool definition.

feat: update tool parameter check grader to improve accuracy

8a4f75c

jc200808 requested review from XieLipeng0830 and ployts February 23, 2026 21:59

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

jc200808 added 3 commits February 23, 2026 14:03

feat: update tool parameter check grader to improve accuracy

2f104b7

feat: update tool parameter check grader to improve accuracy

1664128

feat: update tool parameter check grader to improve accuracy

17ea69f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: update tool parameter check grader to improve accuracy#127

feat: update tool parameter check grader to improve accuracy#127
jc200808 wants to merge 4 commits intoagentscope-ai:mainfrom
jc200808:feat-dev-260223

jc200808 commented Feb 23, 2026

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Uh oh!

gemini-code-assist bot Feb 23, 2026

Uh oh!

gemini-code-assist bot Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool of tool definitions
	Evaluates whether the generated tool call extracts completely correct parameters from the user query and the matching tool definition

	from the user query and the matching tool of tool definitions.
	from the user query and the matching tool definition.

Comments

Conversation

jc200808 commented Feb 23, 2026

OpenJudge Version

Description

Checklist

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant