-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Is your feature request related to a problem? Please describe.
While the current source code typically applies the (standard) z-score normalization to the data, it may be beneficial for non-Gaussian variables to choose a different normalization technique.
A typical example variable for this is precipitation data whose underlying PDF follows the highly right-skewed Gamma distribution. For precipitation data in ML, log transformation followed by a z-score normalization is prefered.
Describe the solution you'd like
Enable support for different channel-specific normalizations. The default should still be to apply the standard z-scre normalization (see e.g. here), but should be possible to control it when initialising the respective dataset.
Describe alternatives you've considered
One could do the log-transformation in the dataset directly, but this would produce overhead in the dataset preprocessing and contradict to the targeted philosophy that new datasets should be ingested in the raw data format.
Additional context
It is noted that some recent large-scale ML models (e.g. AIFS) do not perform a log-transformation on the precipitation data. However, previous experimentation with AtmoRep revealed that it can be beneficial in large-scale applications as well (cf. clessig/atmorep#85).
Organisation
JSC
Metadata
Metadata
Assignees
Labels
Type
Projects
Status