Skip to content

[simple_fsdp] Support SimpleFSDP DSv3 + EP full/selective activation checkpointing.#2406

Open
IvanKobzarev wants to merge 1 commit intomainfrom
sfsdp_ds_ac
Open

[simple_fsdp] Support SimpleFSDP DSv3 + EP full/selective activation checkpointing.#2406
IvanKobzarev wants to merge 1 commit intomainfrom
sfsdp_ds_ac

Conversation

@IvanKobzarev
Copy link
Contributor

@IvanKobzarev IvanKobzarev commented Feb 20, 2026

Before that we required SimpleFSDP DSv3 to not use activation_checkpointing.
It resulted in illegal memory acceses because of differences in recomputed num tokens per expert in fw and bw.

  1. Enabling them via must_save for all_to_all (for full we set it via graph_pass)
    For selective it is added explicitly.

  2. Setting torch._dynamo.config.skip_fwd_side_effects_in_bwd_under_checkpoint = True to bypass mutation checks from dynamo (currently in MoE impl there is a field mutation)

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 20, 2026
@aditvenk
Copy link
Contributor

cc @yiming0416

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants