Skip to content

Conversation

@leifdenby
Copy link
Member

@leifdenby leifdenby commented May 12, 2025

Describe your changes

This PR adds support for taking a zarr dataset produced with mllam-data-prep and inverting the transformations applied to create it. This is implemented in mllam_data_prep.recreate_inputs(ds: xr.Dataset) which returns a dictionary of datasets Dict[str,xr.Dataset] where the is the name of each input defined in the config that was used to create the dataset.

The main motivation for implementing this functionality is to be able to easily take a forecast produced with neural-lam as an zarr dataset and invert back to the data-structure (individual variables on a regular grid) so that these can be used in downstream applications.

Below is an example that shows inverting where the DANRA training dataset included with mllam-data-prep and we just select the danra_heigth_levels to write to zarr:
Screenshot 2025-05-12 at 14 04 47

Added parse to support parsing of the format string used for creating the coordinate values during stacking of variables and coordinates.

Issue Link

#85 should be merged first since it introduces a better handling of MultiIndex resulting from stacking coordinates, and using this better handling makes inverting the operation a lot simpler to handle.

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant