Add support for inverting transformation operations (mapping back to structure of source datasets) #78

leifdenby · 2025-05-12T12:07:08Z

Describe your changes

This PR adds support for taking a zarr dataset produced with mllam-data-prep and inverting the transformations applied to create it. This is implemented in mllam_data_prep.recreate_inputs(ds: xr.Dataset) which returns a dictionary of datasets Dict[str,xr.Dataset] where the is the name of each input defined in the config that was used to create the dataset.

The main motivation for implementing this functionality is to be able to easily take a forecast produced with neural-lam as an zarr dataset and invert back to the data-structure (individual variables on a regular grid) so that these can be used in downstream applications.

Below is an example that shows inverting where the DANRA training dataset included with mllam-data-prep and we just select the danra_heigth_levels to write to zarr:

Added parse to support parsing of the format string used for creating the coordinate values during stacking of variables and coordinates.

Issue Link

#85 should be merged first since it introduces a better handling of MultiIndex resulting from stacking coordinates, and using this better handling makes inverting the operation a lot simpler to handle.

Type of change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
I have performed a self-review of my code
For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
I have updated the documentation to cover introduced code changes
I have added tests that prove my fix is effective or that my feature works
I have given the PR a name that clearly describes the change, written in imperative form (context).
I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

the code is readable
the code is well tested
the code is documented (including return types and parameters)
the code is easy to maintain

Author checklist after completed review

I have added a line to the CHANGELOG describing this change, in a section
reflecting type of change (add section where missing):
- added: when you have added new functionality
- changed: when default behaviour of the code has been changed
- fixes: when your contribution fixes a bug

Checklist for assignee

PR is up to date with the base branch
the tests pass
author has added an entry to the changelog (and designated the change as added, changed or fixed)
Once the PR is ready to be merged, squash commits and merge the PR.

…feat/inverse-ops

leifdenby added 3 commits December 3, 2024 07:48

start work on inverse operations

a7f7324

Merge branch 'main' of https://github.com/mllam/mllam-data-prep into …

cd5b170

…feat/inverse-ops

first fully working implementation!

dfe0df1

ealerskans mentioned this pull request May 15, 2025

Invert neural-lam predictions from grid-index to regular grid ealerskans/mlwm-deployment#2

Open

leifdenby added 13 commits September 16, 2025 12:50

Merge branch 'main' of https://github.com/mllam/mllam-data-prep into …

5cd77b5

…feat/inverse-ops

skip missing output targets in dataset to invert

a246f28

actually select specific inputs

83c0ecf

use cf_xarray.encoding for MultiIndex instead of dropping

d3cc028

encode MultiIndex in create_dataset

42aea1f

use pd.MultiIndex not xr.MultiIndex

da1cc24

use cf_xarray decode

12a9a9f

fix typo

6c185cf

final fixes to MultiIndex decode+unstack

4adc28d

show default args in cli

46ef6db

set attrs for inverted dataset

d3a6221

use utc time

5979423

remove debug statement

23b2bd1

leifdenby mentioned this pull request Nov 25, 2025

Switch to using CF compliant coord stacking (rather than xr.Dataset.reset_index()) #85

Open

20 tasks

leifdenby added this to the v0.8.0 (proposed) milestone Nov 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for inverting transformation operations (mapping back to structure of source datasets) #78

Add support for inverting transformation operations (mapping back to structure of source datasets) #78

Uh oh!

leifdenby commented May 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add support for inverting transformation operations (mapping back to structure of source datasets) #78

Are you sure you want to change the base?

Add support for inverting transformation operations (mapping back to structure of source datasets) #78

Uh oh!

Conversation

leifdenby commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Issue Link

Type of change

Checklist before requesting a review

Checklist for reviewers

Author checklist after completed review

Checklist for assignee

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leifdenby commented May 12, 2025 •

edited

Loading