Issue on reward normalization calculation

**Issue Description:**

There appears to be a bug in the `NormalizeRewardsByEnv` class related to operator precedence in the z-score normalization calculation.

**Current Code:**
```python
normed = ((r-r.mean())/r.std()+1e-8) if self.z_score else r-r.mean()
```

**Issue:**
The current implementation adds the epsilon value (`1e-8`) to the entire z-score calculation instead of adding it to the standard deviation to prevent division by zero.

**Expected Fix:**
```python
normed = ((r-r.mean())/(r.std()+1e-8)) if self.z_score else r-r.mean()
```

**Explanation:**
The epsilon should be added to `r.std()` before division to handle cases where the standard deviation is zero or very close to zero, which would otherwise cause division by zero errors or numerical instability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on reward normalization calculation #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue on reward normalization calculation #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions