Add Naive Training Moe Example Code on Single GPU or Multi GPUs by xhx1022 · Pull Request #10 · intelligent-machine-learning/atorch

xhx1022 · 2025-08-03T09:59:33Z

Added a new training example function demonstrating how to train the MoE model on a single GPU or multi GPUS using dummy data. #9

skydoorkai

Is this example running with 4 GPUs?
Then the title Single-GPU Training is not correct.

skydoorkai · 2025-08-07T05:03:44Z

examples/moe_dualpipe/dualpipe/__init__.py

@@ -0,0 +1,17 @@
+__version__ = "1.0.0"


There are dualpipe codes, no need to be included.
In the README.md, explain how to clone the dualpipe codes, setup PYTHONPATH.

skydoorkai · 2025-08-07T05:04:27Z

examples/moe_dualpipe/examples/moe_train_basic.py

+
+    def apply_load_balancing_loss(self, router_probs, tokens_per_expert):
+        if self.moe_aux_loss_coeff > 0 and self.training:
+            # 计算每个专家的负载


Use English for comments.

skydoorkai · 2025-08-07T05:09:45Z

examples/moe_dualpipe/examples/moe_train_basic.py

+        self.moe_z_loss_coeff = z_loss_coeff
+        self.initializer_range = 0.02
+
+class MoEAuxLossAutoScaler(torch.autograd.Function):


If these MOE model definition codes are copied/modified from other repo's codes, add comments stating the original code source.

implement a training code to train it on single-GPU with dummy data

c14d3e7

xhx1022 requested review from adamantboy, hxdtest, nash635 and skydoorkai as code owners August 3, 2025 09:59

skydoorkai reviewed Aug 7, 2025

View reviewed changes

implement a training code to train it on multi-GPU with dummy data

a45f990

xhx1022 changed the title ~~Add Single-GPU Training Moe Example Code~~ Add Naive Training Moe Example Code on Single GPU or Multi GPUs Aug 17, 2025

xhx1022 closed this by deleting the head repository Oct 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add Naive Training Moe Example Code on Single GPU or Multi GPUs#10

Add Naive Training Moe Example Code on Single GPU or Multi GPUs#10
xhx1022 wants to merge 2 commits intointelligent-machine-learning:mainfrom
xhx1022:moe

xhx1022 commented Aug 3, 2025 •

edited

Loading

Uh oh!

skydoorkai left a comment

Uh oh!

skydoorkai Aug 7, 2025

Uh oh!

skydoorkai Aug 7, 2025

Uh oh!

skydoorkai Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

xhx1022 commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skydoorkai left a comment

Choose a reason for hiding this comment

Uh oh!

skydoorkai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

skydoorkai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

skydoorkai Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xhx1022 commented Aug 3, 2025 •

edited

Loading