Skip to content

Conversation

@Jiaxuan-Sun
Copy link

New advantage_calculator.py Module

Overview

A new module implementing various advantage calculation methods for reinforcement learning algorithms.

Abstract Base Class

  • AdvantageCalculator - Defines a unified interface for all advantage calculators

Implementations

  • GAECalculator - Generalized Advantage Estimation (GAE)
  • CPGDCalculator - Conservative Policy Gradient with Decay (CPGD)
  • REINFORCECalculator - REINFORCE algorithm
  • RLOOCalculator - Relative Leave-One-Out (RLOO)
  • REINFORCEBaselineCalculator - REINFORCE with baseline
  • GroupNormCalculator - Group Normalization (supports "grpo" alias)

Factory Function

  • get_advantage_calculator - Factory function for instance creation


class REINFORCECalculator(DefaultAdvantageCalculator):
"""
Standard REINFORCE calculator.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this advantage calculation corresponds to REINFORCE or REINFORCE++. Could you please clarify?

Default calculator with no reward preprocessing.
Used by methods like GAE and CPGD that don't require reward preprocessing.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like REINFORCECalculator also uses this base class. Please update the overview accordingly."

"""
Group normalization calculator.
Normalizes rewards within each group and optionally filters degenerate cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the corresponding paper reference for each implemented AdvantageCalculator.

)
)
# Prepare helper functions for calculator
get_gae_fn = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_cumulative_returns and get_advantages_and_returns should also be moved into their respective AdvantageCalculator classes.

# CPGD Utility Functions
# ============================================================================

def _get_cpgd_advantages_returns(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_cpgd_advantages_returns should also be moved into their respective AdvantageCalculator classes.

@puyuan1996 puyuan1996 added the refactor Cleanup, formatting, or restructuring of existing code. label Jan 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor Cleanup, formatting, or restructuring of existing code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants