-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
Issue Description:
There appears to be a bug in the NormalizeRewardsByEnv class related to operator precedence in the z-score normalization calculation.
Current Code:
normed = ((r-r.mean())/r.std()+1e-8) if self.z_score else r-r.mean()Issue:
The current implementation adds the epsilon value (1e-8) to the entire z-score calculation instead of adding it to the standard deviation to prevent division by zero.
Expected Fix:
normed = ((r-r.mean())/(r.std()+1e-8)) if self.z_score else r-r.mean()Explanation:
The epsilon should be added to r.std() before division to handle cases where the standard deviation is zero or very close to zero, which would otherwise cause division by zero errors or numerical instability.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels