-
Notifications
You must be signed in to change notification settings - Fork 0
Feat/gribjump #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/gribjump #71
Conversation
|
ci failing due to gdal with ubuntu-latest and python 3.14 - could downgrade for now to circumvent |
hat/extract_timeseries/extractor.py
Outdated
| if list(grid_config["source"].keys())[0] == "gribjump": | ||
| assert index_1d_config is not None | ||
| unique_indices, duplication_indexes = np.unique(df[index_1d_config].values, return_inverse=True) | ||
| grid_config["source"]["gribjump"]["indices"] = unique_indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we force and always use the option fetch_coords_from_fdb=True in .from_source("gribjump",...) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ranges=[(1234, 2345)], will extract all 1D indices from 1234 up until (and not including) 2345, it's not a 2D index. Since we just really want a few indices here I don't think it's suitable for this application? Happy to be corrected if I misunderstood something
I need to read up on the fetch_coords_from_fdb things, but might be useful indeed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, sorry. I misunderstood the ranges meaning, deleted that part of the comment. Cool, I think having the coordinates in the netcdf output would be nice.
Version 0.18.0 introduced improved handling of request-based resources, allowing their usage solely via kwargs. This is a prerequisite to use the "fdb" source without complicating the user configuration or hat's earthkit-data usage.
…bjump) Add basic unit tests for extractor function covering happy paths: - Test index-based and coordinate-based station mapping - Test station filtering functionality - Test output file writing to netCDF - Test validation error when both index and coords provided Also fix bug where invalid config would call create_mask_from_index(None, ...) by adding early validation that raises clear errors. Coverage: 91% for non-gribjump paths. Does not test gribjump source or edge cases (empty inputs, malformed data, etc.). Note: Suppressed NumPy/netCDF4 ABI warning in pytest config (known harmless issue)
Separate gribjump and regular processing to reduce nested decision logic. Streamline functions to return single values instead of tuples where possible. Add early user-friendly validation for station indices and improve naming throughout. - Separate gribjump vs. regular processing paths - Streamline function return values (reduce tuple unpacking) - Add early user-friendly bounds-check for station indices - Rename variables for consistency and remove redundant comments
This new version includes a bugfix that affected the gribjump source and made time_dim_mode="valid_time" not work in the xarray dataset creation.
colonesej
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. For the future, we can use better configuration classes (e.g, conflator) to process and validate the input fields. Also, it would be nice to add more configuration control over the outputs (name of dimensions, metadata) and support different formats/types (like zarr, parquet, etc).
Description
This PR contains experimental work to leverage gribjump for timeseries extraction. Close #67. Close #72.
Contributor Declaration
By opening this pull request, I affirm the following: