Clean up Data handeling methods in `adf_dataset.py` and fix AMWG table formatting #427

justin-richling · 2025-11-14T19:50:14Z

This PR will clean up and organize the calls to read files, datasets, and data arrays depending on the file type, ie time series, climo, regridded climo.

The main changes here are to:

Get all the file type functionality consistent. (probably be able to simplify all these down to a couple of functions still)
Remove baseline/ref name from any of these methods since there will always be only one case, just get it in the methods
Add function in adf_dataset.py to gather new unit info update_unit and call that in load_da. This could be useful as an external method, similarly to get_value_converters
Update amwg_table.py to implement these changes

Important changes:

The modifications for scale factor, offset, new units, and time bound adjustment (for time series) will always happen if the data is drawn using these methods in adf_dataset.py. That was the issue with the tables being improperly formatted, see AMWG tables are not getting proper data cleaning #423

Now you can load data sets and arrays generically any where in the ADF by calling load_dataset(fils) and/or load_da(<list of files>, <variable name>, <case name>)

or through the three file types:
  load_reference_timeseries_dataset/load_timeseries_dataset
  load_reference_climo_dataset/load_climo_dataset
  load_reference_regrid_dataset/load_regrid_dataset

load_<file type>_dataset will inherently call load_dataset and
load_<file type>_da will inherently call load_da

calling load_dataset has a flag for time series files so that it can check the time bounds to ensure proper time alignment. This is only an issue for using older CAM/CESM files but this won't modify anything if the newer files are used.

This change will now force the ADF to figure out the modifications at the loading data array stage

Potential issues:
  user must make sure of the scale/factor, units if using any premade file; using files that aren't created via ADF from the history files level.
  Remedy: The user just has to make sure there are no values for these arguments in the config yaml file if their datasets have been already modified, or use even more generic xarray load via load_dataset from adf_utils
  Not likely but a definite issue for edge case scenarios.

Closes #423 and #424

Remove the dependency of the baseline case name since there will only be one baseline

This will now grab any scale factors and new units for time series files

Now if these methods are called it will apply the scale factors/offset/units and time bounds fixes (for time series)

Make `new_unit` not Tex formatting (mostly for AMWG table viewing) but move it to mpl args for plotting

justin-richling added 8 commits November 13, 2025 13:52

Update all scripts for new reference data load

5344f1b

Remove the dependency of the baseline case name since there will only be one baseline

Use adf_dataset method to get time series data

419e68b

This will now grab any scale factors and new units for time series files

Expand on helper functions for data loading

83ff9aa

Now if these methods are called it will apply the scale factors/offset/units and time bounds fixes (for time series)

Update adf_variable_defaults.yaml

59e2814

Make `new_unit` not Tex formatting (mostly for AMWG table viewing) but move it to mpl args for plotting

Update adf_dataset.py

6aa204b

Update adf_dataset.py

a72b781

Update adf_dataset.py

ad7e7ee

Update adf_dataset.py

bd5b5b1

justin-richling added bug Something isn't working code clean-up Made code simpler and/or easier to read. analysis Related to data analysis and statistics data handling Related to handling of data, ie file i/o, data workflow, etc labels Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clean up Data handeling methods in `adf_dataset.py` and fix AMWG table formatting #427

Clean up Data handeling methods in `adf_dataset.py` and fix AMWG table formatting #427

Uh oh!

justin-richling commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Clean up Data handeling methods in adf_dataset.py and fix AMWG table formatting #427

Are you sure you want to change the base?

Clean up Data handeling methods in adf_dataset.py and fix AMWG table formatting #427

Uh oh!

Conversation

justin-richling commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Clean up Data handeling methods in `adf_dataset.py` and fix AMWG table formatting #427

Clean up Data handeling methods in `adf_dataset.py` and fix AMWG table formatting #427