Skip to content

Support different normalizations #108

@mlangguth89

Description

@mlangguth89

Is your feature request related to a problem? Please describe.

While the current source code typically applies the (standard) z-score normalization to the data, it may be beneficial for non-Gaussian variables to choose a different normalization technique.
A typical example variable for this is precipitation data whose underlying PDF follows the highly right-skewed Gamma distribution. For precipitation data in ML, log transformation followed by a z-score normalization is prefered.

Describe the solution you'd like

Enable support for different channel-specific normalizations. The default should still be to apply the standard z-scre normalization (see e.g. here), but should be possible to control it when initialising the respective dataset.

Describe alternatives you've considered

One could do the log-transformation in the dataset directly, but this would produce overhead in the dataset preprocessing and contradict to the targeted philosophy that new datasets should be ingested in the raw data format.

Additional context

It is noted that some recent large-scale ML models (e.g. AIFS) do not perform a log-transformation on the precipitation data. However, previous experimentation with AtmoRep revealed that it can be beneficial in large-scale applications as well (cf. clessig/atmorep#85).

Organisation

JSC

Metadata

Metadata

Assignees

No one assigned

    Labels

    modelRelated to model training or definition (not generic infra)

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions