Skip to content

Conversation

@maxwhitemet
Copy link
Contributor

@maxwhitemet maxwhitemet commented Jan 8, 2026

Addresses #953.

This PR adds the plugin, CLI, and tests for stochastic noise generation.

The acceptance tests show the impact, with data available here.
image

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

@cpelley
Copy link
Contributor

cpelley commented Jan 12, 2026

@maxwhitemet, could you merge/rebase master into your branch. The failing "CI Tests / Test-Coverage (pull_request)" should hopefully be addressed by the now merged #2273

image

There does appear to be a new issue potentially with the improver documentation building now though... (different issue)

@cpelley
Copy link
Contributor

cpelley commented Jan 13, 2026

Ignore the test coverage failure. Fixed in #2282

Copy link
Contributor

@gavinevans gavinevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maxwhitemet 👍

I've added some comments below.

- geopandas
- lightgbm>=4.0.0
- numba
- pysteps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to say, we shouldn't be including pysteps in this environment. If this has been done to avoid unit test failures, then we just need to handle the potential lack of availability of pysteps in the environment by skipping the unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

- geopandas
- lightgbm>=4.0.0
- numba
- pysteps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the docstring at the top of the file to record that the stochastic noise CLI isn't supported by this environment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Also added a couple full stops.

#
# This file is part of 'IMPROVER' and is released under the BSD 3-Clause license.
# See LICENSE in the root of the repository for full licensing details.
"""CLI to add stochastic noise to a dependence template cube using SSFT."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to spell out SSFT here, as it's not a sufficiently well known concept i.e. Short-Space Fourier Transform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

threshold_units: str = "mm/hr",
):
"""
Add stochastic noise to a dependence template cube object using SSFT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell out SSFT: Short-Space Fourier Transform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

- scipy=1.15
- sphinx<9.0.0 # https://github.com/tox-dev/sphinx-autodoc-typehints/issues/586
- pandas
- pysteps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed, as optional dependencies, like pysteps are intended to be included in this environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


num_workers = min(len(template.coord("realization").points), os.cpu_count() - 1)

results = compute(*delay_results, scheduler="threads", num_workers=num_workers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note, I don't think that we've used dask like this in the improver codebase before, although I think this usage makes sense. This might be something to discuss.

Comment on lines 220 to 223
# Set noise to zero where dependence template data is below threshold
# to prevent spurious wet pixels
dry_mask = template.data < self._convert_threshold_units(template)
noise.data[dry_mask] = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that this is different to the intended implementation with the intended implementation deliberately making the stochastic noise negative, so that it only has an impact for grid points where there is no precipitation in the templates. It would be good to discuss this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the change suggested, though made the scaling optional via a scale_subthreshold_noise parameter.

Comment on lines 123 to 124
[[3.5848932, 4.995262], [2.2589254, 6.5118866]],
[[3.859587, 5.289296], [2.5182567, 6.830268]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a slight misunderstanding here, as the intended outputs here should just be the inputs (as none of the grid points in the input is 0 mm/hr).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the plugin and tests following our conversations on my misunderstanding.

assert result.data.dtype == np.float32

# Verify noise was added (output differs from input)
assert not np.allclose(result.data, simple_cube.data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you should check what the numbers are here, rather than what they are not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted.


return output_cube

def process(self, dependence_template: Cube) -> Cube:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the acceptance tests have actually highlighted an issue where the units of the Cube being passed in, aren't being preserved, so although the metadata in the Cube says "m/s", the values are actually mm/hr. This should be corrected, so that the values put into the output cube are in the correct units.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now fixed. Thank you for pointing this out.

Copy link
Contributor Author

@maxwhitemet maxwhitemet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @gavinevans. I have now made the requested changes.

- scipy=1.15
- sphinx<9.0.0 # https://github.com/tox-dev/sphinx-autodoc-typehints/issues/586
- pandas
- pysteps
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

- geopandas
- lightgbm>=4.0.0
- numba
- pysteps
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

- geopandas
- lightgbm>=4.0.0
- numba
- pysteps
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Also added a couple full stops.

#
# This file is part of 'IMPROVER' and is released under the BSD 3-Clause license.
# See LICENSE in the root of the repository for full licensing details.
"""CLI to add stochastic noise to a dependence template cube using SSFT."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

threshold_units: str = "mm/hr",
):
"""
Add stochastic noise to a dependence template cube object using SSFT.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

delayed(self.do_fft)(realization_data.astype(np.float32))
)

num_workers = min(len(template.coord("realization").points), os.cpu_count() - 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


return output_cube

def process(self, dependence_template: Cube) -> Cube:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now fixed. Thank you for pointing this out.

Comment on lines 220 to 223
# Set noise to zero where dependence template data is below threshold
# to prevent spurious wet pixels
dry_mask = template.data < self._convert_threshold_units(template)
noise.data[dry_mask] = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the change suggested, though made the scaling optional via a scale_subthreshold_noise parameter.

Comment on lines 123 to 124
[[3.5848932, 4.995262], [2.2589254, 6.5118866]],
[[3.859587, 5.289296], [2.5182567, 6.830268]],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the plugin and tests following our conversations on my misunderstanding.

assert result.data.dtype == np.float32

# Verify noise was added (output differs from input)
assert not np.allclose(result.data, simple_cube.data)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants