Add `CoupledTrainStepper` by jpdunc23 · Pull Request #809 · ai2cm/ace

jpdunc23 · 2026-02-09T22:41:45Z

Adds train_stepper: CoupledTrainStepperConfig to the coupled training config, which configures and builds a CoupledTrainStepper implementing TrainStepperABC.

WARNING: This is a breaking change for existing coupled training configs.

Changes:

Component stepper loss: StepLossConfig and loss_contributions: LossContributionsConfig are now configured via the ocean: ComponentTrainingConfig and atmosphere: ComponentTrainingConfig attributes of CoupledTrainStepperConfig.
CoupledStepper no longer implements TrainStepperABC.
Removed public loss_obj and effective_loss_scaling properties from fme.ace.stepper.Stepper and added a new public method build_loss.
Tests added

jpdunc23 · 2026-02-09T22:52:32Z

fme/coupled/loss.py

    ) -> StepLossABC:
        if self.n_steps == 0 or self.weight == 0.0:
-            return NullLossContributions()
+            return NullLossContributions(loss_obj)


This preserves the existing behavior where we used a component stepper's effective_loss_scaling to compute mse_fractional_components metrics even if the stepper had no loss contribution in coupled training.

…in-stepper

mcgibbon

Just some nits (nits are optional), I don't need to re-review them. LGTM

mcgibbon · 2026-02-10T17:44:40Z

fme/coupled/test_loss.py


+    @property
+    def effective_loss_scaling(self):
+        raise NotImplementedError


Suggested change

raise NotImplementedError

raise NotImplementedError()

mcgibbon · 2026-02-10T17:46:53Z

fme/coupled/test_loss.py

    atmos_loss_config = LossContributionsConfig()
    atmosphere_loss = atmos_loss_config.build(
-        loss_obj=lambda *_, **__: torch.tensor(5.25),
+        loss_obj=Mock(spec=StepLoss, side_effect=lambda *_, **__: torch.tensor(5.25)),


nit: The three lines changed use three different ways to specify the loss side-effect - via mae_loss, via a lambda function returning a constant, and via a return_value instead of a side_effect. You could consider using return_value for this one to reduce that down to 2 ways, at least.

mcgibbon · 2026-02-10T17:48:37Z

fme/coupled/test_stepper.py

        n_samples=3,
    )
-    output = coupler.train_on_batch(
+    train_stepper_config = CoupledTrainStepperConfig(


nit: Avoid the 3x copy-paste of this process by making a get_train_stepper_and_batch helper that does it and calls get_stepper_and_batch internally.

Good idea, but I'll defer this cleanup to #814 since the way in which the train stepper is built is going to change.

mcgibbon · 2026-02-10T18:48:20Z

fme/coupled/stepper.py

+
+
+@dataclasses.dataclass
+class CoupledTrainStepperConfig:


Do you have an example of the updated training config committed somewhere I could check out? It would be nice to have a baseline config for coupled training, if so I could see the changes to the baseline in this PR.

Update: Ah I see test_train.py mostly fits this purpose, good. Still, could be nice to have a baseline in the future.

Agreed, I will work on a new PR to add the baseline.

mcgibbon · 2026-02-10T18:52:23Z

fme/coupled/test_train.py

-    loss_contributions:
-      n_steps: {loss_atmos_n_steps}
    stepper:
      loss:


Question: Why is loss: type: MSE in both the atmosphere: stepper: and in the train_stepper: atmosphere:? I am guessing because we haven't updated the ACE configs yet and it's required in the config, in which case that's fine, but I though I should ask to be sure.

That's right, although including it in the yaml here isn't strictly necessary since there is a default value on StepperConfig. I'll remove it so it's a bit clearer here.

mcgibbon · 2026-02-10T18:55:00Z

fme/coupled/train/train.py

        atmosphere_normalize=stepper.atmosphere.normalizer.normalize,
-        ocean_loss_scaling=stepper.ocean.effective_loss_scaling,
-        atmosphere_loss_scaling=stepper.atmosphere.effective_loss_scaling,
+        ocean_loss_scaling=stepper.effective_loss_scaling.ocean,


nit: pass loss_scaling: stepper.loss_scaling instead of two arguments containing the parts

…in-stepper

Add CoupledTrainStepper

f2b832b

jpdunc23 commented Feb 9, 2026

View reviewed changes

jpdunc23 added 5 commits February 9, 2026 15:02

Bit of cleanup

9ffb135

Reduce diff slightly

e962de2

Fix test issues

a138e3e

Remove parameter_init from ComponentTrainingConfig

1c35113

Merge branch 'main' of github.com:ai2cm/ace into refactor/coupled-tra…

dbcad3f

…in-stepper

jpdunc23 marked this pull request as ready for review February 10, 2026 16:53

Merge branch 'main' into refactor/coupled-train-stepper

a899ec8

mcgibbon approved these changes Feb 10, 2026

View reviewed changes

jpdunc23 added 2 commits February 10, 2026 13:25

Address review comments

108d4a8

Merge branch 'main' of github.com:ai2cm/ace into refactor/coupled-tra…

5c85341

…in-stepper

jpdunc23 enabled auto-merge (squash) February 10, 2026 21:27

Merge branch 'main' into refactor/coupled-train-stepper

40f0907

jpdunc23 merged commit aea4317 into main Feb 11, 2026
7 checks passed

jpdunc23 deleted the refactor/coupled-train-stepper branch February 11, 2026 07:56

jpdunc23 mentioned this pull request Feb 12, 2026

Move parameter_init to train stepper configs #814

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `CoupledTrainStepper`#809

Add `CoupledTrainStepper`#809
jpdunc23 merged 10 commits intomainfrom
refactor/coupled-train-stepper

jpdunc23 commented Feb 9, 2026 •

edited

Loading

Uh oh!

jpdunc23 Feb 9, 2026

Uh oh!

mcgibbon left a comment

Uh oh!

mcgibbon Feb 10, 2026

Uh oh!

mcgibbon Feb 10, 2026 •

edited

Loading

Uh oh!

mcgibbon Feb 10, 2026

Uh oh!

jpdunc23 Feb 10, 2026

Uh oh!

mcgibbon Feb 10, 2026

Uh oh!

jpdunc23 Feb 10, 2026

Uh oh!

mcgibbon Feb 10, 2026

Uh oh!

jpdunc23 Feb 10, 2026

Uh oh!

mcgibbon Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments



		@dataclasses.dataclass
		class CoupledTrainStepperConfig:

Conversation

jpdunc23 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcgibbon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcgibbon Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

jpdunc23 commented Feb 9, 2026 •

edited

Loading

mcgibbon Feb 10, 2026 •

edited

Loading