Estimate redistribute_cost by _TransformInfo#230
Open
mori360 wants to merge 3 commits intometa-pytorch:mainfrom
Open
Estimate redistribute_cost by _TransformInfo#230mori360 wants to merge 3 commits intometa-pytorch:mainfrom
mori360 wants to merge 3 commits intometa-pytorch:mainfrom
Conversation
fmassa
reviewed
Nov 6, 2025
| if current == target: | ||
| continue | ||
| num_devices_on_mesh_dim = mesh_topo.mesh_dim_devices[i] | ||
| for transform_info in transform_infos: |
Contributor
There was a problem hiding this comment.
In general, what we would like to have here I think is the minimal redistribution cost over all possible input/output orderings.
This is to ensure that we don't have to increase the search space for AutoParallel when performing the optimization, as we can focus only on the shardings (without order) and then optimize the ordering afterwards.
Does it make sense?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Estimate redistribute_cost by _TransformInfo rather than comparing source and destination state
Here are the changes:
a. S(0)S(0) -> S(0)R, need 1 allgather
b. S(0)S(0) -> RS(0), need 2 allgather, which could not be found if only care S(0)->R
comm_bytes_gbis based on tensor shape and number of shards. In case 2.b, the comm_byte for 2 allgather is different.TODO:
There are some compute_cost with comm_bytes_gb, need to verify whether they could return the expected cost.