Skip to content

Conversation

@Oisin-M
Copy link
Collaborator

@Oisin-M Oisin-M commented Oct 16, 2025

Description

This PR contains experimental work to leverage gribjump for timeseries extraction. Close #67. Close #72.

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

@Oisin-M
Copy link
Collaborator Author

Oisin-M commented Nov 3, 2025

ci failing due to gdal with ubuntu-latest and python 3.14 - could downgrade for now to circumvent

if list(grid_config["source"].keys())[0] == "gribjump":
assert index_1d_config is not None
unique_indices, duplication_indexes = np.unique(df[index_1d_config].values, return_inverse=True)
grid_config["source"]["gribjump"]["indices"] = unique_indices
Copy link
Collaborator

@colonesej colonesej Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we force and always use the option fetch_coords_from_fdb=True in .from_source("gribjump",...) ?

Copy link
Collaborator Author

@Oisin-M Oisin-M Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ranges=[(1234, 2345)], will extract all 1D indices from 1234 up until (and not including) 2345, it's not a 2D index. Since we just really want a few indices here I don't think it's suitable for this application? Happy to be corrected if I misunderstood something

I need to read up on the fetch_coords_from_fdb things, but might be useful indeed!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, sorry. I misunderstood the ranges meaning, deleted that part of the comment. Cool, I think having the coordinates in the netcdf output would be nice.

Version 0.18.0 introduced improved handling of request-based resources,
allowing their usage solely via kwargs. This is a prerequisite to use
the "fdb" source without complicating the user configuration or hat's
earthkit-data usage.
…bjump)

Add basic unit tests for extractor function covering happy paths:
- Test index-based and coordinate-based station mapping
- Test station filtering functionality
- Test output file writing to netCDF
- Test validation error when both index and coords provided

Also fix bug where invalid config would call create_mask_from_index(None, ...)
by adding early validation that raises clear errors.

Coverage: 91% for non-gribjump paths. Does not test gribjump source or edge
cases (empty inputs, malformed data, etc.).

Note: Suppressed NumPy/netCDF4 ABI warning in pytest config (known harmless issue)
Separate gribjump and regular processing to reduce nested decision logic.
Streamline functions to return single values instead of tuples where possible.
Add early user-friendly validation for station indices and improve naming
throughout.

- Separate gribjump vs. regular processing paths
- Streamline function return values (reduce tuple unpacking)
- Add early user-friendly bounds-check for station indices
- Rename variables for consistency and remove redundant comments
This new version includes a bugfix that affected the gribjump
source and made time_dim_mode="valid_time" not work in the
xarray dataset creation.
Copy link
Collaborator

@colonesej colonesej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. For the future, we can use better configuration classes (e.g, conflator) to process and validate the input fields. Also, it would be nice to add more configuration control over the outputs (name of dimensions, metadata) and support different formats/types (like zarr, parquet, etc).

@Oisin-M Oisin-M merged commit 1b564cd into develop Nov 25, 2025
4 checks passed
@Oisin-M Oisin-M deleted the feat/gribjump branch November 25, 2025 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent configs faster extraction using gribjump

4 participants