Skip to content

Conversation

@justin-richling
Copy link
Collaborator

This PR will clean up and organize the calls to read files, datasets, and data arrays depending on the file type, ie time series, climo, regridded climo.

The main changes here are to:

  1. Get all the file type functionality consistent. (probably be able to simplify all these down to a couple of functions still)
  2. Remove baseline/ref name from any of these methods since there will always be only one case, just get it in the methods
  3. Add function in adf_dataset.py to gather new unit info update_unit and call that in load_da. This could be useful as an external method, similarly to get_value_converters
  4. Update amwg_table.py to implement these changes

Important changes:

  1. The modifications for scale factor, offset, new units, and time bound adjustment (for time series) will always happen if the data is drawn using these methods in adf_dataset.py. That was the issue with the tables being improperly formatted, see AMWG tables are not getting proper data cleaning #423

Now you can load data sets and arrays generically any where in the ADF by calling load_dataset(fils) and/or load_da(<list of files>, <variable name>, <case name>)

or through the three file types:
  load_reference_timeseries_dataset/load_timeseries_dataset
  load_reference_climo_dataset/load_climo_dataset
  load_reference_regrid_dataset/load_regrid_dataset

  load_<file type>_dataset will inherently call load_dataset and
  load_<file type>_da will inherently call load_da

calling load_dataset has a flag for time series files so that it can check the time bounds to ensure proper time alignment. This is only an issue for using older CAM/CESM files but this won't modify anything if the newer files are used.

This change will now force the ADF to figure out the modifications at the loading data array stage

Potential issues:
  user must make sure of the scale/factor, units if using any premade file; using files that aren't created via ADF from the history files level.
  Remedy: The user just has to make sure there are no values for these arguments in the config yaml file if their datasets have been already modified, or use even more generic xarray load via load_dataset from adf_utils
  Not likely but a definite issue for edge case scenarios.

Closes #423 and #424

Remove the dependency of the baseline case name since there will only be one baseline
This will now grab any scale factors and new units for time series files
Now if these methods are called it will apply the scale factors/offset/units and time bounds fixes (for time series)
Make `new_unit` not Tex formatting (mostly for AMWG table viewing) but move it to mpl args for plotting
@justin-richling justin-richling added bug Something isn't working code clean-up Made code simpler and/or easier to read. analysis Related to data analysis and statistics data handling Related to handling of data, ie file i/o, data workflow, etc labels Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

analysis Related to data analysis and statistics bug Something isn't working code clean-up Made code simpler and/or easier to read. data handling Related to handling of data, ie file i/o, data workflow, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AMWG tables are not getting proper data cleaning

1 participant