Skip to content

Do not shard unused parameters#1773

Open
kctezcan wants to merge 1 commit intoecmwf:developfrom
MeteoSwiss:ktezcan/dev/iss1750_load_sharding
Open

Do not shard unused parameters#1773
kctezcan wants to merge 1 commit intoecmwf:developfrom
MeteoSwiss:ktezcan/dev/iss1750_load_sharding

Conversation

@kctezcan
Copy link
Contributor

@kctezcan kctezcan commented Feb 2, 2026

Description

See #1750

Issue Number

Closes #1750

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

Copy link
Contributor

@shmh40 shmh40 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and doesn't cause problems for me.

# maybe_sharded_sd[param_name.replace("module.", "")] = nn.Parameter(sharded_tensor)
maybe_sharded_sd[param_name] = torch.nn.Parameter(sharded_tensor)
if sharded_meta_param is None:
logger.info(f"Sharding meta parameters is None for: {param_name}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that sharded_meta_param is None means that this is a parameter in the checkpoint that is not present in the current model?

sharded_meta_param.placements,
)
# maybe_sharded_sd[param_name.replace("module.", "")] = nn.Parameter(sharded_tensor)
maybe_sharded_sd[param_name] = torch.nn.Parameter(sharded_tensor)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please remove the line below.

@github-project-automation github-project-automation bot moved this to In Progress in WeatherGen-dev Feb 6, 2026
@clessig
Copy link
Collaborator

clessig commented Feb 11, 2026

@kctezcan : can you address the comments so that we can merge this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Part of network cannot be sharded during loaded if not used

3 participants