-
Notifications
You must be signed in to change notification settings - Fork 5
refacotr(sunjx): refactor advantage calculation logic #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| class REINFORCECalculator(DefaultAdvantageCalculator): | ||
| """ | ||
| Standard REINFORCE calculator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this advantage calculation corresponds to REINFORCE or REINFORCE++. Could you please clarify?
| Default calculator with no reward preprocessing. | ||
| Used by methods like GAE and CPGD that don't require reward preprocessing. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like REINFORCECalculator also uses this base class. Please update the overview accordingly."
| """ | ||
| Group normalization calculator. | ||
| Normalizes rewards within each group and optionally filters degenerate cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the corresponding paper reference for each implemented AdvantageCalculator.
| ) | ||
| ) | ||
| # Prepare helper functions for calculator | ||
| get_gae_fn = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_cumulative_returns and get_advantages_and_returns should also be moved into their respective AdvantageCalculator classes.
| # CPGD Utility Functions | ||
| # ============================================================================ | ||
|
|
||
| def _get_cpgd_advantages_returns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_cpgd_advantages_returns should also be moved into their respective AdvantageCalculator classes.
New advantage_calculator.py Module
Overview
A new module implementing various advantage calculation methods for reinforcement learning algorithms.
Abstract Base Class
Implementations
Factory Function