Skip to content

Conversation

@ealerskans
Copy link
Contributor

@ealerskans ealerskans commented Feb 4, 2025

Describe your changes

This PR fixes a bug in what dimensions are added to the output dataset. Currently, the dimensions of the input dataset are all added. However, if one only wants to extract e.g. u and v from the 100 m altitude in the DANRA height levels dataset, the current implementation will add altitude as a dimension meaning that all selected variables will be broadcasted to have altitude as a dimension, even though we're only selecting one variables from one altitude. This is fixed by removing the coordinates assignment to the output dataset when initialising it.

Furthermore, there is a similar bug for the derived variables, in which the input dataset dimensions are used. The change by @joeloskarsson (joeloskarsson@8040378) fixes this by using the expected input variable dimensions instead as target_dims.

No change in dependencies for this fix.

Issue Link

Tries to solve #58.

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

Copy link
Contributor

@observingClouds observingClouds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @ealerskans! Could you please add some tests, e.g. that the output variables are the one selected and that there are no NaN values?

- Test that the output variables match the ones specified
  in the config
- Test that the output dataset does not contain any nans
@ealerskans
Copy link
Contributor Author

Thanks for the PR @ealerskans! Could you please add some tests, e.g. that the output variables are the one selected and that there are no NaN values?

@observingClouds I tried to add some tests for what you suggested. I am very new to this so please have a look and let me know if something does not look right or it it would be better to do this in another way :)

Comment on lines 247 to 273
INVALID_PRESSURE_LEVEL_TEST_SECTION = """\
inputs:
danra_pressure_levels:
path: https://object-store.os-api.cci1.ecmwf.int/mllam-testdata/danra_cropped/v0.2.0/pressure_levels.zarr
dims: [time, x, y, pressure]
variables:
z:
pressure:
values: [1000,]
units: hPa
t:
pressure:
values: [800, ]
units: hPa
dim_mapping:
time:
method: rename
dim: time
state_feature:
method: stack_variables_by_var_name
dims: [pressure]
name_format: "{var_name}{pressure}m"
grid_index:
method: stack
dims: [x, y]
target_output_variable: state
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part here is related to issue #61. We agreed that we should be able to support this kind of selection, where we select variable form different levels. But this is currently not supported and would require re-writing of the code. So not sure if this PR should be merged before that has been addressed. I am in process of implementing this and will hopefully create a PR for it later today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am now in doubt if this would even work after that change since that will introduce a check in the reading of the config file and then throw an error if we're trying to make this kind of selection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky question. I would still implement all tests and then add a decorator to the test that will allow it to be skipped in case a NotImplementedError is raised, like here but specific to NotImplementedError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed, I will implement the decorator addition in #65 instead :)

@leifdenby leifdenby added the bug Something isn't working label Feb 10, 2025
@observingClouds
Copy link
Contributor

blocked by #66 / #67

@observingClouds observingClouds added the blocked PR blocked by an other issue/PR label Feb 13, 2025
@leifdenby
Copy link
Member

no longer blocked now that @observingClouds has merged #66 and #67 🥳

@leifdenby leifdenby removed the blocked PR blocked by an other issue/PR label Feb 13, 2025
observingClouds and others added 3 commits February 13, 2025 20:31
Refactor the config modifier to make it more general and make the file more structured
Copy link
Contributor

@observingClouds observingClouds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! This looks good to me now 😉

@observingClouds observingClouds merged commit 144ca10 into mllam:main Feb 14, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue when adding coordinates to the output dataset

3 participants