Separate regression tests for frozen and latest checkpoints by mcgibbon · Pull Request #830 · ai2cm/ace

mcgibbon · 2026-02-12T20:46:32Z

This PR defines separate regression tests for "frozen" checkpoints from specific commits (which shouldn't be updated) and "latest" checkpoints ensuring ongoing backwards compatibility (which should be updated).

jpdunc23

I'm still concerned about silently allowing new parameters with frozen checkpoints, though it maybe warrants a discussion at a tech sync and I'm willing to punt for now if you prefer.

Other than that, I have one blocking comment about updating the existing artifact.

jpdunc23 · 2026-02-12T20:53:04Z

fme/core/registry/module.py

        n_out_channels: int,
        dataset_info: DatasetInfo,
-    ) -> nn.Module:
+    ) -> Module:


jpdunc23 · 2026-02-12T21:05:57Z

fme/core/registry/test_module_registry.py

+
+
+LATEST_BUILDERS = {
+    "NoiseConditionedSFNO": get_noise_conditioned_sfno_module,


Should NoiseConditionedSFNO_state_dict.pt also be updated?

It was already a "latest" version.

jpdunc23 · 2026-02-12T21:33:26Z

fme/core/registry/test_module_registry.py

+def test_frozen_module_backwards_compatibility(selector_name: str):
+    """
+    Backwards compatibility for frozen releases from specific commits.
+    """
+    set_seed(0)
+    module = FROZEN_BUILDERS[selector_name]()
+    loaded_state_dict = load_state(selector_name)
+    module.load_state(loaded_state_dict)


I'm still a bit concerned that if we place no limits on new keys then we could inadvertently introduce changes in behavior that lead to regressions in inference skill with FROZEN_BUILDERS checkpoints. I wonder if we should also raise an error here with a message along the lines of:

"New module parameters {new_keys} found that were not present in the "frozen" checkpoint. New module parameters may be added but should be enabled by adding a new config parameter that, by default, does not add the new parameters when building the module."

We could also save and reload the module config dict together with the artifact to verify that the config builds the same architecture. Of course there are a million other ways to change the module code that could lead to inference regressions, but I don't see why we should allow arbitrary new parameters that weren't present when the checkpoint was saved if there is a way to avoid it.

That type of thing is supposed to be covered by the "produces the same result" test(s). If you're concerned about new parameters not affecting an initial prediction but affecting later ones after the first gradient update, I should add a second stage to those tests that does a second step when testing for identicality.

I see though now what you're saying, I've not been understanding it. In practice, what I have here won't catch any of the cases I care about updating the regression tests for, because we always add them in a way that sets the weights to None (which doesn't get registered in the state dict). Really what we need is to remember to update/write a new test when we add features that define new weights.

What I actually want to do is test that the config has no new keys, and force the user to build a new latest checkpoint when new config keys are added, saving the asdict'd config with the checkpoint. I'll see about adding that, and also adding what you suggested about making sure the config builds the same architecture.

I looked at it more, and because we load model states with strict=True (the default), this will already error if the built model keys differ from the checkpoint. I can see how my other (wrong) test implied this wasn't the case, though.

Ah, good point. Maybe we should add a test that confirms this behavior for Module.load_state (and maybe also other parts of Module). But that's a preexisting issue.

…e into feature/legacy_regression_tests

jpdunc23

One unused argument, otherwise just minor comments and LGTM.

jpdunc23 · 2026-02-18T23:09:27Z

fme/core/registry/test_module_registry.py

+
+
+def load_or_cache_state(
+    selector_name: str, module: Module, module_config: ModuleConfig | None = None


The module_config argument to load_or_cache_state appears to be unused.

jpdunc23 · 2026-02-18T23:18:18Z

fme/core/registry/test_module_registry.py

+    img_shape = (9, 18)
+    n_in_channels = 5
+    n_out_channels = 6
+    all_labels = {"a", "b"}
+    timestep = datetime.timedelta(hours=6)
+    device = fme.get_device()
+    horizontal_coordinate = LatLonCoordinates(
+        lat=torch.zeros(img_shape[0], device=device),
+        lon=torch.zeros(img_shape[1], device=device),
+    )
+    vertical_coordinate = HybridSigmaPressureCoordinate(
+        ak=torch.arange(7, device=device), bk=torch.arange(7, device=device)
+    )
+    dataset_info = DatasetInfo(
+        horizontal_coordinates=horizontal_coordinate,
+        vertical_coordinate=vertical_coordinate,
+        timestep=timestep,
+        all_labels=all_labels,
+    )


Consider adding a new / reusing an existing shared helper to create the DatasetInfo that both get_dbc2925_ncsfno_module and get_noise_conditioned_sfno_module can reuse.

I think it makes sense to repeat the select = ModuleSelector( blocks, even if they're currently identical.

jpdunc23 · 2026-02-18T23:35:46Z

fme/core/registry/test_module_registry.py

+def test_frozen_module_backwards_compatibility(selector_name: str):
+    """
+    Backwards compatibility for frozen releases from specific commits.
+    """
+    set_seed(0)
+    module = FROZEN_BUILDERS[selector_name]()
+    loaded_state_dict = load_state(selector_name)
+    module.load_state(loaded_state_dict)


Ah, good point. Maybe we should add a test that confirms this behavior for Module.load_state (and maybe also other parts of Module). But that's a preexisting issue.

mcgibbon added 11 commits February 12, 2026 17:24

revert change to weight shape with 1 group

a63196f

fix using load state dict pre hook

d4f0014

remove redundant check

8449cc2

avoid using unnecessary private version of method

f6a2e01

name the pre-load hook

b399e71

add regression test, fix code

7cfa7db

Merge branch 'main' into fix/hide_group_dim

5b7caed

remove assertion

5e6736e

add frozen and latest tests

e0b87ae

add separate tests for frozen and latest checkpoints

19af595

Merge branch 'main' into feature/legacy_regression_tests

5c6f633

mcgibbon marked this pull request as ready for review February 12, 2026 20:49

mcgibbon requested a review from jpdunc23 February 12, 2026 20:49

jpdunc23 reviewed Feb 12, 2026

View reviewed changes

mcgibbon added 3 commits February 18, 2026 16:57

Merge branch 'main' into feature/legacy_regression_tests

6f7a4bd

update handling of latest checkpoint update reqs

c9696e0

Merge branch 'feature/legacy_regression_tests' of github.com:ai2cm/ac…

562698b

…e into feature/legacy_regression_tests

jpdunc23 approved these changes Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate regression tests for frozen and latest checkpoints#830

Separate regression tests for frozen and latest checkpoints#830
mcgibbon wants to merge 14 commits intomainfrom
feature/legacy_regression_tests

mcgibbon commented Feb 12, 2026 •

edited

Loading

Uh oh!

jpdunc23 left a comment

Uh oh!

jpdunc23 Feb 12, 2026

Uh oh!

jpdunc23 Feb 12, 2026

Uh oh!

mcgibbon Feb 12, 2026

Uh oh!

jpdunc23 Feb 12, 2026

Uh oh!

mcgibbon Feb 12, 2026

Uh oh!

mcgibbon Feb 18, 2026 •

edited

Loading

Uh oh!

jpdunc23 Feb 18, 2026

Uh oh!

jpdunc23 left a comment

Uh oh!

jpdunc23 Feb 18, 2026

Uh oh!

jpdunc23 Feb 18, 2026

Uh oh!

jpdunc23 Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments



		LATEST_BUILDERS = {
		"NoiseConditionedSFNO": get_noise_conditioned_sfno_module,



		def load_or_cache_state(
		selector_name: str, module: Module, module_config: ModuleConfig \| None = None

Conversation

mcgibbon commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpdunc23 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcgibbon Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpdunc23 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

mcgibbon commented Feb 12, 2026 •

edited

Loading

mcgibbon Feb 18, 2026 •

edited

Loading