Skip to content

Conversation

@chr6192
Copy link
Collaborator

@chr6192 chr6192 commented Jan 11, 2026

OpenJudge Version

0.2.0

Description

Add OpenJudge integration guide for VERL with async reward evaluation.

Checklist

Please check the following items before code is ready to be reviewed.

  • [✓] Code has been formatted with pre-commit run --all-files command
  • [✓] All tests are passing
  • [✓] Docstrings are in Google style
  • [✓] Related documentation has been updated (e.g. links, examples, etc.)
  • [✓] Code is ready for review

@chr6192 chr6192 self-assigned this Jan 11, 2026
@chr6192 chr6192 added the documentation Improvements or additions to documentation label Jan 11, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chr6192, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new documentation file that serves as a detailed guide for integrating the OpenJudge reward system into VERL training. The primary goal is to enhance the flexibility and performance of reward computation by leveraging OpenJudge's modular grading capabilities and asynchronous evaluation, which is particularly beneficial for incorporating LLM-based assessments without creating performance bottlenecks.

Highlights

  • New OpenJudge Integration Guide: A comprehensive guide has been added detailing how to integrate the OpenJudge reward system with VERL (Vision-Enhanced Reinforcement Learning) training.
  • Modular and Multi-Dimensional Rewards: The integration enables the use of modular, multi-dimensional reward computation, combining both rule-based and LLM-as-Judge graders for more sophisticated evaluation.
  • Asynchronous Reward Evaluation: The guide emphasizes high-performance asynchronous concurrency for reward evaluation, crucial for speeding up LLM-as-Judge processes within the VERL training loop.
  • Detailed Architecture and Implementation: It outlines a three-layer architecture (Graders, RewardFunction, RewardManager) and provides step-by-step instructions, code examples, and configuration parameters for seamless integration.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive integration guide for using OpenJudge with VERL. The documentation is well-structured, detailed, and provides clear examples for users to follow. I've identified a few minor issues in the code snippets, such as missing imports and inconsistent naming, which could prevent the examples from running correctly. Addressing these will improve the clarity and usability of this excellent guide.

results = await runner.arun_multiple_datasets(datasets)

# 4. Parse and aggregate
return self._aggregate_scores(results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This example calls self._aggregate_scores(results), but the more detailed implementation in Section 4.2 uses _parse_and_aggregate. Using a consistent method name across examples will reduce confusion for readers.

Suggested change
return self._aggregate_scores(results)
return self._parse_and_aggregate(results)

Comment on lines +546 to +552
try:
runner_results = await runner.arun_multiple_datasets(datasets)
except Exception as e:
logger.error(f"Grading failed: {e}")
# Return default scores on error
return self._create_error_results(prompt_to_samples, error=str(e))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logger object is used here for error logging, but it's not imported in the code example. This will cause a NameError when an exception occurs. Please add from loguru import logger at the top of the MyRewardFunction code snippet to ensure the example is runnable.

```bash

# Define paths
REWARD_FUN="${PATH_TO_DR}/reward/openjudge_reward_function.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The placeholder ${PATH_TO_DR} is used here but is not defined or explained, which could be confusing. Using a more descriptive placeholder like ${PATH_TO_YOUR_PROJECT} would improve clarity. It's also helpful to add a comment explaining that this path needs to be replaced.

Suggested change
REWARD_FUN="${PATH_TO_DR}/reward/openjudge_reward_function.py"
REWARD_FUN="${PATH_TO_YOUR_PROJECT}/reward/openjudge_reward_function.py"

Comment on lines +819 to +864
def _dump_generations(self, messages, inputs, outputs, scores, reward_extra_infos_dict, dump_path, all_details=None):
"""
Dump training samples with details.

Extracts details from reward_extra_info["details"] if not provided separately.
"""
os.makedirs(dump_path, exist_ok=True)
filename = os.path.join(dump_path, f"{self.global_steps}.jsonl")

# Extract details from dict if not provided
if all_details is None and "details" in reward_extra_infos_dict:
all_details = reward_extra_infos_dict["details"]
# Remove from dict to avoid duplication
reward_extra_infos_dict = {
k: v for k, v in reward_extra_infos_dict.items()
if k != "details"
}

# Prepare data
n = len(inputs)
base_data = {
"messages": messages,
"input": inputs,
"output": outputs,
"score": scores,
"step": [self.global_steps] * n,
}

# Add reward_extra_info fields
for k, v in reward_extra_infos_dict.items():
if len(v) == n:
base_data[k] = v

# Add details if available
if all_details is not None and len(all_details) == n:
base_data["details"] = all_details

# Write JSONL
lines = []
for i in range(n):
entry = {k: v[i] for k, v in base_data.items()}
lines.append(json.dumps(entry, ensure_ascii=False))

with open(filename, "w") as f:
f.write("\n".join(lines) + "\n")
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _dump_generations method in this example uses the os and json modules, but they are not imported. This will lead to a NameError. To make the example correct, please add import os and import json at the beginning of the MyRayPPOTrainer code snippet.

A: Create fresh `GradingRunner` per training step (see [Section 5.1](#critical-event-loop-management))

**Q: Getting zero scores?**
A: Enable debug logging: `logger.add(sys.stderr, level="DEBUG")`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The troubleshooting advice suggests using logger.add(sys.stderr, level="DEBUG"), but this will fail without the necessary imports for logger and sys. To make this guidance more helpful, please include the required imports in the example.

A clearer example would be:

# Add these imports at the top of your script
import sys
from loguru import logger

# Then enable debug logging
logger.add(sys.stderr, level="DEBUG")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants