refacotr(sunjx): refactor advantage calculation logic #16

Jiaxuan-Sun · 2025-12-31T11:55:24Z

New advantage_calculator.py Module

Overview

A new module implementing various advantage calculation methods for reinforcement learning algorithms.

Abstract Base Class

AdvantageCalculator - Defines a unified interface for all advantage calculators

Implementations

GAECalculator - Generalized Advantage Estimation (GAE)
CPGDCalculator - Conservative Policy Gradient with Decay (CPGD)
REINFORCECalculator - REINFORCE algorithm
RLOOCalculator - Relative Leave-One-Out (RLOO)
REINFORCEBaselineCalculator - REINFORCE with baseline
GroupNormCalculator - Group Normalization (supports "grpo" alias)

Factory Function

get_advantage_calculator - Factory function for instance creation

puyuan1996 · 2026-01-04T09:24:31Z

lightrft/trainer/advantage_calculator.py

+
+class REINFORCECalculator(DefaultAdvantageCalculator):
+    """
+    Standard REINFORCE calculator.


I'm not sure if this advantage calculation corresponds to REINFORCE or REINFORCE++. Could you please clarify?

puyuan1996 · 2026-01-04T09:30:24Z

lightrft/trainer/advantage_calculator.py

+    Default calculator with no reward preprocessing.
+
+    Used by methods like GAE and CPGD that don't require reward preprocessing.
+    """


It looks like REINFORCECalculator also uses this base class. Please update the overview accordingly."

puyuan1996 · 2026-01-04T09:35:56Z

lightrft/trainer/advantage_calculator.py

+    """
+    Group normalization calculator.
+
+    Normalizes rewards within each group and optionally filters degenerate cases.


Please add the corresponding paper reference for each implemented AdvantageCalculator.

puyuan1996 · 2026-01-04T09:36:40Z

lightrft/trainer/fast_exp_maker.py

-                    )
-                )
+            # Prepare helper functions for calculator
+            get_gae_fn = (


get_cumulative_returns and get_advantages_and_returns should also be moved into their respective AdvantageCalculator classes.

puyuan1996 · 2026-01-04T09:37:22Z

lightrft/trainer/advantage_calculator.py

+# CPGD Utility Functions
+# ============================================================================
+
+def _get_cpgd_advantages_returns(


_get_cpgd_advantages_returns should also be moved into their respective AdvantageCalculator classes.

…tor class

refacotr(sunjx): refactor advantage calculation logic

a03dbe4

puyuan1996 requested changes Jan 4, 2026

View reviewed changes

puyuan1996 added the refactor Cleanup, formatting, or restructuring of existing code. label Jan 4, 2026

puyuan1996 mentioned this pull request Jan 5, 2026

Roadmap for LightRFT v0.1.1 #19

Open

refactor(sunjx): move functions to the corresponding AdvantageCalcula…

22441fb

…tor class

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refacotr(sunjx): refactor advantage calculation logic #16

refacotr(sunjx): refactor advantage calculation logic #16

Uh oh!

Jiaxuan-Sun commented Dec 31, 2025

Uh oh!

puyuan1996 Jan 4, 2026

Uh oh!

puyuan1996 Jan 4, 2026

Uh oh!

puyuan1996 Jan 4, 2026

Uh oh!

puyuan1996 Jan 4, 2026

Uh oh!

puyuan1996 Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refacotr(sunjx): refactor advantage calculation logic #16

Are you sure you want to change the base?

refacotr(sunjx): refactor advantage calculation logic #16

Uh oh!

Conversation

Jiaxuan-Sun commented Dec 31, 2025

New advantage_calculator.py Module

Overview

Abstract Base Class

Implementations

Factory Function

Uh oh!

puyuan1996 Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants