-
Notifications
You must be signed in to change notification settings - Fork 18
Fix unwanted dimensions in dataset #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unwanted dimensions in dataset #60
Conversation
observingClouds
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @ealerskans! Could you please add some tests, e.g. that the output variables are the one selected and that there are no NaN values?
@observingClouds I tried to add some tests for what you suggested. I am very new to this so please have a look and let me know if something does not look right or it it would be better to do this in another way :) |
tests/test_dataset.py
Outdated
| INVALID_PRESSURE_LEVEL_TEST_SECTION = """\ | ||
| inputs: | ||
| danra_pressure_levels: | ||
| path: https://object-store.os-api.cci1.ecmwf.int/mllam-testdata/danra_cropped/v0.2.0/pressure_levels.zarr | ||
| dims: [time, x, y, pressure] | ||
| variables: | ||
| z: | ||
| pressure: | ||
| values: [1000,] | ||
| units: hPa | ||
| t: | ||
| pressure: | ||
| values: [800, ] | ||
| units: hPa | ||
| dim_mapping: | ||
| time: | ||
| method: rename | ||
| dim: time | ||
| state_feature: | ||
| method: stack_variables_by_var_name | ||
| dims: [pressure] | ||
| name_format: "{var_name}{pressure}m" | ||
| grid_index: | ||
| method: stack | ||
| dims: [x, y] | ||
| target_output_variable: state | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part here is related to issue #61. We agreed that we should be able to support this kind of selection, where we select variable form different levels. But this is currently not supported and would require re-writing of the code. So not sure if this PR should be merged before that has been addressed. I am in process of implementing this and will hopefully create a PR for it later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I am now in doubt if this would even work after that change since that will introduce a check in the reading of the config file and then throw an error if we're trying to make this kind of selection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tricky question. I would still implement all tests and then add a decorator to the test that will allow it to be skipped in case a NotImplementedError is raised, like here but specific to NotImplementedError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As agreed, I will implement the decorator addition in #65 instead :)
|
no longer blocked now that @observingClouds has merged #66 and #67 🥳 |
observingClouds
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet! This looks good to me now 😉
Describe your changes
This PR fixes a bug in what dimensions are added to the output dataset. Currently, the dimensions of the input dataset are all added. However, if one only wants to extract e.g.
uandvfrom the 100 m altitude in the DANRA height levels dataset, the current implementation will addaltitudeas a dimension meaning that all selected variables will be broadcasted to havealtitudeas a dimension, even though we're only selecting one variables from one altitude. This is fixed by removing the coordinates assignment to the output dataset when initialising it.Furthermore, there is a similar bug for the derived variables, in which the input dataset dimensions are used. The change by @joeloskarsson (joeloskarsson@8040378) fixes this by using the expected input variable dimensions instead as
target_dims.No change in dependencies for this fix.
Issue Link
Tries to solve #58.
Type of change
Checklist before requesting a review
pullwith--rebaseoption if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
reflecting type of change (add section where missing):
Checklist for assignee