Bug Fix PPO / Optax chain by wittlsn · Pull Request #75 · araffin/sbx

wittlsn · 2025-08-22T06:31:26Z

Description

PPO currently assumes the optimizer is an optax.chain with two elements.
When using a single-transform optimizer (e.g. optax.adam), learning crashes with an IndexError at sbx/ppo/ppo.py:262.

Fixes #77

Motivation and Context

This fix allows PPO to be used with a wider range of Optax optimizers.
Currently, the assumption about optax.chain length unnecessarily restricts optimizer choices.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)
I have checked that the documentation builds using make doc (required)

araffin · 2025-08-27T08:49:14Z

Hello,
could you please give a minimal example to reproduce the error?

wittlsn · 2025-08-27T11:58:57Z

Hello,

I added a minimal example that uses a custom implementation of the PPOPolicy.
The learning fails if an Optax chain with fewer than two elements is used.

araffin · 2025-08-27T13:17:36Z

Sorry, I meant adding it to the description of this PR, not as a test. This should have been done in an issue before creating the PR, in order to understand and discuss the problem (see contributing guide).

wittlsn · 2025-08-27T13:43:38Z

I’ve now created the issue and linked it to this PR.
Sorry for not following the proper workflow earlier.

relative opt state

f754781

added minimal example

43a29f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Bug Fix PPO / Optax chain#75

Bug Fix PPO / Optax chain#75
wittlsn wants to merge 2 commits intoaraffin:masterfrom
wittlsn:ppo_opt_state

wittlsn commented Aug 22, 2025 •

edited

Loading

Uh oh!

araffin commented Aug 27, 2025

Uh oh!

wittlsn commented Aug 27, 2025

Uh oh!

araffin commented Aug 27, 2025 •

edited

Loading

Uh oh!

wittlsn commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

wittlsn commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Types of changes

Checklist:

Uh oh!

araffin commented Aug 27, 2025

Uh oh!

wittlsn commented Aug 27, 2025

Uh oh!

araffin commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wittlsn commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wittlsn commented Aug 22, 2025 •

edited

Loading

araffin commented Aug 27, 2025 •

edited

Loading