Skip to content

Conversation

@drlyamzin
Copy link

@drlyamzin drlyamzin commented Jan 29, 2025

This pull request adds changes to the original NVlabs/FourCastNet repository so that it is possible to run precipitation inference.

  • Dockerfile has been modified to exclude missing directories from build process.
  • Paths and other parameters in config/AFNO.yaml have been changed, era5 source and required parameters have been added.
  • Script to download data statistics has been added, the corresponding data has been uploaded to the bucket.
  • Data processing utility in parallel_copy_small_set.py that converts era5 to the correct format has been added.
  • Data processing script data_for_inference.py is added.
  • Inference script inference_precip.py has been modified to save predictions and ground truth. The number of initial conditions has been changed.

Added code follows Recursive standards, but the existing repository needs refactoring. CI pipeline is not set up.

Checked the work against all of the ticket requirements

  • CHANGELOG updated
  • Formatter has been run against all changed files
  • All changed files have been added correctly
  • Appropriate unit tests and integration tests have been added
  • All tests are passing
  • Ad hoc testing has been done
  • Affected docs have been updated
  • Tickets created for any TODOs
  • All licenses have been checked and confirmed for commercial use

@MrCsabaToth
Copy link

I have run into two issues in the parallel_copy_small_set.py of the main branch, I'm using h5py 3.12.1:

  1. KeyError: "Unable to synchronously open object (object 'fields' doesn't exist)", I had to add if 'fields' not in fdest: fdest.create_dataset('fields', (Nimgtot, len(varslist), fsrc.shape[-2], fsrc.shape[-1]), dtype=fsrc.dtype)
  2. However that might be wrong, because then I run into an indexing issue at fdest['fields'][idx:idx+batch, channel_idx, :, :] = ims: IndexError: Index (1) out of range for (0-0)

@MrCsabaToth
Copy link

In download_data.sh I get

ucketNotFoundException: 404 gs://borealis-models bucket does not exist.
CommandException: 1 file/object could not be transferred.

@MrCsabaToth
Copy link

MrCsabaToth commented Feb 16, 2025

When I try the writetofile_simplest function I get TypeError: Can't broadcast (52, 721, 1440) -> (4, 721, 1440) for the first write of u10

@nataraj2
Copy link

nataraj2 commented Jul 1, 2025

Re: fields does not exist while running parallel_copy_small_set.py

Following this link, run the script below first to create an empty hdf5 dataset (change the number of time steps to that in the downloaded netcdf4 file from era5)

import h5py

# number of time steps
time_steps = 1

with h5py.File('filename.h5', 'w') as f:
f.create_dataset('fields', shape = (time_steps, 20, 721, 1440), dtype='f')

and then use the generated h5 file to be populated in parallel_copy_small_set.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants