Skip to content

Comments

Data service updates#44

Open
dylanlee wants to merge 3 commits intomainfrom
data-service-updates
Open

Data service updates#44
dylanlee wants to merge 3 commits intomainfrom
data-service-updates

Conversation

@dylanlee
Copy link
Contributor

@dylanlee dylanlee commented Aug 28, 2025

This PR updates the scripts in src/ associated with intermediate file handling between jobs, querying the STAC, and querying the hand-index. The following changes were made:

  1. For stac_querier.py go ahead and add aoi_is_item query functionality.
  2. For hand_index_querier.py improve DuckDB credential setting, simplified
    catchment data parquet structure, and better error handling
    data_service.py now has credential refresh. This was necessary because
    couldn't get fsspec to use IAM credentials directly so was obtaining
    credentials from the IAM using boto3 and passing them to fsspec. But
    then those credentials were expiring.
  3. Pass through pass through additional metadata fields gauge, hucs, and
    stac_items from STAC queries
  4. Add aoi_is_item flag to be able to query directly by stac item
  5. Append_file_to_uri() method supporting atomic append operations for S3
    and local files
  6. Add DataServiceException class for better specific error handling. This is
    so that a pipeline can be reported as failed if there is a data service
    issue

For stac_querier.py go ahead and add aoi_is_item query functionality.
For hand_index_querier.py improve DuckDB credential setting, simplified
catchment data parquet structure, and better error handling
data_service.py now has credential refresh. This was necessary because
couldn't get fsspec to use IAM credentials directly so was obtaining
credentials from the IAM using boto3 and passing them to fsspec. But
then those credentials were expiring.

Pass through pass through additional metadata fields gauge, hucs, and
stac_items from STAC queries

Add aoi_is_item flag to be able to query directly by stac item

append_file_to_uri() method supporting atomic append operations for S3
and local files

DataServiceException class for better specific error handling. This is
so that a pipeline can be reported as failed if there is a data service
issue
@dylanlee dylanlee force-pushed the data-service-updates branch from d400680 to ce9ab24 Compare September 10, 2025 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant