closure

closure is a flexible, reproducible, and distributed-ready machine learning framework for training on fields files of ECsim/iPiC3D Particle in Cell Codes, with a focus on fluid closure. It provides tools for data loading, preprocessing, model training, evaluation, and experiment management, supporting both single-node and distributed (multi-GPU/multi-node) workflows.

What does closure do?

Experiment Management:
Organizes experiments in a clear directory structure. Each experiment (called a "run") is stored in its own subfolder, with all configs, logs, and results saved for reproducibility.
Flexible Data Handling:
Loads and preprocesses large scientific datasets (e.g., from simulations or experiments), supporting normalization, filtering, patch extraction, and custom feature/target selection.
Model Training:
Supports a variety of neural network architectures (MLP, FCNN, ResNet, etc.) and training protocols, including distributed training with PyTorch.
Evaluation & Analysis:
Provides utilities for loss/metric evaluation, plotting, and comparison across runs or experiments.
Reproducibility:
Every run is fully specified by a config.json file, making it easy to reproduce or compare results.

Directory Structure

closure/src/ — Core source code (trainers, datasets, models, utilities, etc.)
closure/examples/ — Example notebooks and scripts (see tuto_train_haydn.ipynb)
work_dir/ — User-specified experiment directory; each run is a subfolder (e.g., work_dir/0, work_dir/nosubsample, etc.)

Quick Start

1. Install dependencies

Option 1: Install via conda:

conda env create -f gputorch.yml

Option 2: Install dependencies manually:

Python 3.x
PyTorch
numpy, pandas, joblib, scipy, matplotlib

2. Prepare your data

Organize your simulation or experimental data as described in the tutorial notebook.
Prepare CSV files listing your train/val/test samples.

3. Define configuration dictionaries

See examples/tuto_train_haydn.ipynb for a full walkthrough.

from closure.src.trainers import Trainer

dataset_kwargs = {...}      # See notebook for details
load_data_kwargs = {...}
model_kwargs = {...}

trainer = Trainer(
    work_dir="experiments/my_experiment",
    dataset_kwargs=dataset_kwargs,
    load_data_kwargs=load_data_kwargs,
    model_kwargs=model_kwargs,
    device="cuda"
)
trainer.fit()

4. Command-line usage

A tutorial on the usage of the trainer can be found in Tutorial_trainer.pdf in the examples folder.

python -m closure.src.trainers --config work_dir=work_dir --config run=run_name --config model_kwargs.model_name=ResNet

Use --config key=value to override config values (supports nested keys with dot notation).

How does experiment management work?

The Trainer manages a parent directory (work_dir).
Each run is a subfolder (e.g., work_dir/0, work_dir/nosubsample), containing:
- config.json — full configuration for the run
- model.pth — trained model weights
- loss_dict.pkl — training/validation loss history
- run.log — log file for the run
- normalization files (X.pkl, y.pkl)
To start a new run, change any config (e.g., hyperparameters, model, batch size) and call trainer.fit(config=new_config). A new subfolder is created.
To reload or compare runs, use the load_run method or the provided utilities.

Example Workflow

See examples/tuto_train_haydn.ipynb for a step-by-step tutorial, including:

Setting up dataset and model configs
Training a model
Running multiple experiments in the same folder
Evaluating and comparing results
Reloading previous runs

What to Pay Attention To

Configuration Consistency:
When running multiple experiments in the same work_dir, make sure the dataset configuration (dataset_kwargs) is consistent across runs. If you change the dataset, use a new work_dir to avoid normalization mismatches.
Run Naming:
Use the run key in your config to create new subfolders for each experiment. This keeps results organized and prevents accidental overwrites.
Distributed Setup:
The script automatically handles distributed training if environment variables (like WORLD_SIZE) are set. Make sure your cluster/job environment is configured correctly.
Logging:
Each run has its own log file, making it easy to debug and track progress. Logs include information about the host, rank, and device.
Reproducibility:
All configs, normalization parameters, and model weights are saved for each run, ensuring you can always reproduce or analyze past experiments.
Extensibility:
The Trainer is designed to be flexible—custom models, datasets, and training protocols can be plugged in with minimal changes.

Main Components

src/trainers.py — The Trainer class: manages configs, logging, datasets, models, and training.
src/datasets.py — Data loading utilities: distributed/serial samplers, normalization, filtering, patch extraction.
src/models.py — Model definitions: MLP, FCNN, ResNet, and the PyNet training wrapper.
src/utilities.py — Utility functions: config handling, evaluation, plotting, and more.
src/read_pic.py — Functions for reading iPiC3D/ECsim simulation data in h5 or pickle format.

Datasets and Data Loading

closure provides a powerful and flexible system for loading, preprocessing, and sampling scientific datasets, designed for both single-node and distributed (multi-GPU) workflows.

Key Features

Distributed and Serial Sampling:
Efficiently split and shuffle data across multiple GPUs or nodes using the DistributedSampler and SubSampler classes. This ensures each process gets a unique subset of the data, enabling scalable training.
Channel-based Data Loading:
The ChannelDataLoader extends PyTorch's DataLoader to allow selection of specific feature and target channels (e.g., physical quantities like density, pressure, etc.), as well as patch-based cropping for image-like data.
Flexible Subsampling and Shuffling:
Easily control the fraction of data used for training or validation via the subsample_rate and subsample_seed parameters. This is useful for quick prototyping or handling very large datasets. This is also useful for oversampling which is necessary if you are using random patch extraction.
Patch Extraction:
For image or field data, you can extract random spatial patches on-the-fly, which is useful for data selection or training convolutional models.
Feature/Target Normalization:
The DataFrameDataset class supports normalization (mean/std) and pre-scaling (e.g., log transforms) of both features and targets. Normalization parameters are saved and reused for reproducibility.
Filtering and Transformations:
Apply custom filters (e.g., smoothing, denoising) to features or targets using scipy or custom functions. Supports torchvision-style transforms for data augmentation.
CSV/Metadata Integration:
Datasets are typically defined by CSV files listing sample filenames and metadata, making it easy to manage large and complex datasets.

Example Usage

from closure.src.datasets import DataFrameDataset, ChannelDataLoader

dataset = DataFrameDataset(
    data_folder='/path/to/data',
    norm_folder='/path/to/norm',
    samples_file='/path/to/samples.csv',
    feature_dtype='float32',
    target_dtype='float32',
    scaler_features=None,
    scaler_targets=None,
    transform={'RandomCrop': {'size': (16, 16)}, 'apply': ['train']},
    datalabel='train'
)

loader = ChannelDataLoader(
    dataset,
    batch_size=32,
    feature_channel_names=['Bx', 'By', 'Bz'],
    target_channel_names=['Pxx_e', 'Pyy_e'],
    subsample_rate=0.5,
    subsample_seed=42,
    patch_dim=[16, 16],
)

for features, targets in loader:
    # Training loop here
    pass

Typical Workflow

Prepare CSV files listing your train/val/test samples.
Configure dataset_kwargs to specify data location, features/targets, normalization, and any filters or transforms.
Create a DataFrameDataset for each split (train/val/test).
Wrap with ChannelDataLoader for batching, shuffling, and distributed support.
Pass loaders to the Trainer for model training and evaluation.

Models

closure provides a modular and extensible system for defining, training, and evaluating neural network models, with a focus on scientific and physical data. The models are implemented using PyTorch and are designed to be easily configurable and compatible with distributed training.

Key Features

Unified Training Wrapper (PyNet):
The PyNet class wraps any supported model architecture and manages the full training lifecycle, including optimizer and scheduler setup, distributed training (via PyTorch DDP), checkpointing, early stopping, and logging. It supports custom optimizers, learning rate schedulers, and metrics.
Model Zoo:
Several neural network architectures are provided out-of-the-box:
- MLP (Multi-Layer Perceptron): For tabular or flattened data, with configurable layer sizes, activations, and dropout.
- FCNN (Fully Convolutional Neural Network): For image or field data, supports configurable channels, kernels, activations, batch normalization, and dropout.
- ResNet: Flexible residual network with user-defined skip connections, suitable for deep image-based models.
- CNet: Example convolutional network for image data.
Custom Initialization:
Models support custom weight and bias initialization, as well as flexible activation and dropout configuration.
Distributed Training:
All models can be wrapped in PyTorch's DistributedDataParallel for multi-GPU or multi-node training.
Checkpointing and Reloading:
Models and training histories can be saved and reloaded from disk, supporting experiment reproducibility and resuming.

Citing & License

Author: George Miloshevich
License: MIT License
Projects: STRIDE, HELIOSKILL

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
examples		examples
src		src
.gitignore		.gitignore
README.md		README.md
genius1-1-8.sh		genius1-1-8.sh
genius2-2-8.sh		genius2-2-8.sh
genius4-2-8.sh		genius4-2-8.sh
gputorch.yml		gputorch.yml
job4-2-8-56991146.err		job4-2-8-56991146.err
job4-2-8-56991146.out		job4-2-8-56991146.out
requirements.txt		requirements.txt
test.sh		test.sh
torch.yml		torch.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

closure

What does closure do?

Directory Structure

Quick Start

1. Install dependencies

2. Prepare your data

3. Define configuration dictionaries

4. Command-line usage

How does experiment management work?

Example Workflow

What to Pay Attention To

Main Components

Datasets and Data Loading

Key Features

Example Usage

Typical Workflow

Models

Key Features

Citing & License

Further Reading

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

georgemilosh/closure

Folders and files

Latest commit

History

Repository files navigation

closure

What does closure do?

Directory Structure

Quick Start

1. Install dependencies

2. Prepare your data

3. Define configuration dictionaries

4. Command-line usage

How does experiment management work?

Example Workflow

What to Pay Attention To

Main Components

Datasets and Data Loading

Key Features

Example Usage

Typical Workflow

Models

Key Features

Citing & License

Further Reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages