Add OpenJudge integration guide for VERL with async reward evaluation. #53

chr6192 · 2026-01-11T14:51:41Z

OpenJudge Version

0.2.0

Description

Add OpenJudge integration guide for VERL with async reward evaluation.

Checklist

Please check the following items before code is ready to be reviewed.

[✓] Code has been formatted with pre-commit run --all-files command
[✓] All tests are passing
[✓] Docstrings are in Google style
[✓] Related documentation has been updated (e.g. links, examples, etc.)
[✓] Code is ready for review

gemini-code-assist · 2026-01-11T14:51:53Z

Summary of Changes

Hello @chr6192, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new documentation file that serves as a detailed guide for integrating the OpenJudge reward system into VERL training. The primary goal is to enhance the flexibility and performance of reward computation by leveraging OpenJudge's modular grading capabilities and asynchronous evaluation, which is particularly beneficial for incorporating LLM-based assessments without creating performance bottlenecks.

Highlights

New OpenJudge Integration Guide: A comprehensive guide has been added detailing how to integrate the OpenJudge reward system with VERL (Vision-Enhanced Reinforcement Learning) training.
Modular and Multi-Dimensional Rewards: The integration enables the use of modular, multi-dimensional reward computation, combining both rule-based and LLM-as-Judge graders for more sophisticated evaluation.
Asynchronous Reward Evaluation: The guide emphasizes high-performance asynchronous concurrency for reward evaluation, crucial for speeding up LLM-as-Judge processes within the VERL training loop.
Detailed Architecture and Implementation: It outlines a three-layer architecture (Graders, RewardFunction, RewardManager) and provides step-by-step instructions, code examples, and configuration parameters for seamless integration.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a comprehensive integration guide for using OpenJudge with VERL. The documentation is well-structured, detailed, and provides clear examples for users to follow. I've identified a few minor issues in the code snippets, such as missing imports and inconsistent naming, which could prevent the examples from running correctly. Addressing these will improve the clarity and usability of this excellent guide.

gemini-code-assist · 2026-01-11T14:53:37Z

docs/integrations/verl.md

+        results = await runner.arun_multiple_datasets(datasets)
+
+        # 4. Parse and aggregate
+        return self._aggregate_scores(results)


This example calls self._aggregate_scores(results), but the more detailed implementation in Section 4.2 uses _parse_and_aggregate. Using a consistent method name across examples will reduce confusion for readers.

Suggested change

return self._aggregate_scores(results)

return self._parse_and_aggregate(results)

gemini-code-assist · 2026-01-11T14:53:37Z

docs/integrations/verl.md

+        try:
+            runner_results = await runner.arun_multiple_datasets(datasets)
+        except Exception as e:
+            logger.error(f"Grading failed: {e}")
+            # Return default scores on error
+            return self._create_error_results(prompt_to_samples, error=str(e))
+


The logger object is used here for error logging, but it's not imported in the code example. This will cause a NameError when an exception occurs. Please add from loguru import logger at the top of the MyRewardFunction code snippet to ensure the example is runnable.

gemini-code-assist · 2026-01-11T14:53:37Z

docs/integrations/verl.md

+```bash
+
+# Define paths
+REWARD_FUN="${PATH_TO_DR}/reward/openjudge_reward_function.py"


The placeholder ${PATH_TO_DR} is used here but is not defined or explained, which could be confusing. Using a more descriptive placeholder like ${PATH_TO_YOUR_PROJECT} would improve clarity. It's also helpful to add a comment explaining that this path needs to be replaced.

Suggested change

REWARD_FUN="${PATH_TO_DR}/reward/openjudge_reward_function.py"

REWARD_FUN="${PATH_TO_YOUR_PROJECT}/reward/openjudge_reward_function.py"

gemini-code-assist · 2026-01-11T14:53:37Z

docs/integrations/verl.md

+    def _dump_generations(self, messages, inputs, outputs, scores, reward_extra_infos_dict, dump_path, all_details=None):
+        """
+        Dump training samples with details.
+
+        Extracts details from reward_extra_info["details"] if not provided separately.
+        """
+        os.makedirs(dump_path, exist_ok=True)
+        filename = os.path.join(dump_path, f"{self.global_steps}.jsonl")
+
+        # Extract details from dict if not provided
+        if all_details is None and "details" in reward_extra_infos_dict:
+            all_details = reward_extra_infos_dict["details"]
+            # Remove from dict to avoid duplication
+            reward_extra_infos_dict = {
+                k: v for k, v in reward_extra_infos_dict.items() 
+                if k != "details"
+            }
+
+        # Prepare data
+        n = len(inputs)
+        base_data = {
+            "messages": messages,
+            "input": inputs,
+            "output": outputs,
+            "score": scores,
+            "step": [self.global_steps] * n,
+        }
+
+        # Add reward_extra_info fields
+        for k, v in reward_extra_infos_dict.items():
+            if len(v) == n:
+                base_data[k] = v
+
+        # Add details if available
+        if all_details is not None and len(all_details) == n:
+            base_data["details"] = all_details
+
+        # Write JSONL
+        lines = []
+        for i in range(n):
+            entry = {k: v[i] for k, v in base_data.items()}
+            lines.append(json.dumps(entry, ensure_ascii=False))
+
+        with open(filename, "w") as f:
+            f.write("\n".join(lines) + "\n")
+```


The _dump_generations method in this example uses the os and json modules, but they are not imported. This will lead to a NameError. To make the example correct, please add import os and import json at the beginning of the MyRayPPOTrainer code snippet.

gemini-code-assist · 2026-01-11T14:53:37Z

docs/integrations/verl.md

+A: Create fresh `GradingRunner` per training step (see [Section 5.1](#critical-event-loop-management))
+
+**Q: Getting zero scores?**  
+A: Enable debug logging: `logger.add(sys.stderr, level="DEBUG")`


The troubleshooting advice suggests using logger.add(sys.stderr, level="DEBUG"), but this will fail without the necessary imports for logger and sys. To make this guidance more helpful, please include the required imports in the example.

A clearer example would be:

# Add these imports at the top of your script import sys from loguru import logger # Then enable debug logging logger.add(sys.stderr, level="DEBUG")

Add OpenJudge integration guide for VERL with async reward evaluation.

f664f14

chr6192 requested review from XieLipeng0830, helloml0326 and ployts January 11, 2026 14:51

chr6192 self-assigned this Jan 11, 2026

chr6192 added the documentation Improvements or additions to documentation label Jan 11, 2026

gemini-code-assist bot reviewed Jan 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OpenJudge integration guide for VERL with async reward evaluation. #53

Add OpenJudge integration guide for VERL with async reward evaluation. #53

chr6192 commented Jan 11, 2026

Uh oh!

gemini-code-assist bot commented Jan 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 11, 2026

Uh oh!

gemini-code-assist bot Jan 11, 2026

Uh oh!

gemini-code-assist bot Jan 11, 2026

Uh oh!

gemini-code-assist bot Jan 11, 2026

Uh oh!

gemini-code-assist bot Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	return self._aggregate_scores(results)
	return self._parse_and_aggregate(results)

	REWARD_FUN="${PATH_TO_DR}/reward/openjudge_reward_function.py"
	REWARD_FUN="${PATH_TO_YOUR_PROJECT}/reward/openjudge_reward_function.py"

Add OpenJudge integration guide for VERL with async reward evaluation. #53

Are you sure you want to change the base?

Add OpenJudge integration guide for VERL with async reward evaluation. #53

Conversation

chr6192 commented Jan 11, 2026

OpenJudge Version

Description

Checklist

Uh oh!

gemini-code-assist bot commented Jan 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants