Skip to content

Repository for "Panda: Self-distillation of Reusable Sensor-level Representations for High Energy Physics"

License

Notifications You must be signed in to change notification settings

DeepLearnPhysics/Panda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Panda: Self-distillation of Reusable Sensor-level Representations for High Energy Physics

This repo provides pre-trained models, inference code, and visualization demos for LArTPC point cloud analysis. The training and evaluation code can be found in the pimm repository.

teaser

Overview

Installation

This repo provides two ways of installation: standalone mode and package mode.

  • The standalone mode is recommended for users who want to use the code for quick inference and visualization. The whole environment including cuda and pytorch can be easily installed by running the following command:

    # create and activate conda environment named as 'panda'
    # cuda: 12.4, pytorch: 2.5.0
    
    # run `unset CUDA_PATH` if you have installed cuda in your local environment
    conda env create -f environment.yml --verbose
    conda activate panda

    We install FlashAttention by default, but it is not necessary. If FlashAttention is not available in your local environment, check the Model section in Quick Start for a solution.

  • The package mode is recommended for users who want to inject the model into a separate codebase. We provide a setup.py file for installation. You can install the package by running the following command:

    # ensure CudaCUDA and Pytorch are already installed in your local environment
    
    # CUDA_VERSION: cuda version of local environment (e.g., 124), check by running 'nvcc --version'
    # TORCH_VERSION: torch version of local environment (e.g., 2.5.0), check by running 'python -c "import torch; print(torch.__version__)"'
    pip install spconv-cu${CUDA_VERSION}
    pip install torch-scatter -f https://data.pyg.org/whl/torch-{TORCH_VERSION}+cu${CUDA_VERSION}.html
    pip install git+https://github.com/Dao-AILab/flash-attention.git
    pip install huggingface_hub timm h5py
    
    # (optional, or directly copy the panda folder to your project)
    python setup.py install

    Additionally, for running our demo code, the following packages are also required:

    pip install plotly matplotlib jupyter

Dataset

We use the PILArNet-M dataset (~168 GB), which can be downloaded directly from HuggingFace:

import panda

# auto-download and create dataset
dataset = panda.PILArNetH5Dataset(split="all")

# or download manually first
data_root = panda.download_pilarnet(split="all")

See DATASET.md for full documentation on dataset structure, labels, and more advanced usage.

Quick Start

  • Model. Load the pre-trained model by running the following command:

    # load the pre-trained model from Huggingface
    # supported models: "base", "particle", "interaction", "semantic"
    # ckpt is cached in ~/.cache/panda/ckpt, and the path can be customized by setting 'download_root'
    model = panda.load("base").cuda()
    
    # load the pre-trained model from local path
    # assume the ckpt file is stored in the 'ckpt' folder
    model = panda.load("ckpt/panda_base.pth").cuda()
    
    # the ckpt file stores the config and state_dict of pretrained model

    If FlashAttention is not available, load the pre-trained model with the following code:

    custom_config = dict(enable_flash=False)
    model = panda.load("base", custom_config=custom_config).cuda()
  • Inference. Run the inference by running the following command:

    EVENT_IDX = 0
    dataset = panda.PILArNetH5Dataset(split="val", energy_threshold=0.13)
    point = dataset[EVENT_IDX]
    for key in point.keys():
        if isinstance(point[key], torch.Tensor):
            point[key] = point[key].cuda(non_blocking=True)
    point = model(point)

    Full example notebooks for accessing the dataset, image encoding, particle and interaction clustering, and semantic segmentation can be found in notebooks.

Citing Panda

If you find this work useful, please consider citing the following paper:

@misc{young2025pandaselfdistillationreusablesensorlevel,
      title={Panda: Self-distillation of Reusable Sensor-level Representations for High Energy Physics}, 
      author={Samuel Young and Kazuhiro Terao},
      year={2025},
      eprint={2512.01324},
      archivePrefix={arXiv},
      primaryClass={hep-ex},
      url={https://arxiv.org/abs/2512.01324}, 
}

Acknowledgements

This repository is based on the Sonata paper's inference repository, which can be found https://github.com/facebookresearch/sonata. Parts of this code, of which were taken from the original repository, are licsensed under the Apache 2.0 license.

This work is supported by the U.S. Department of Energy, Office of Science, and Office of High Energy Physics under Contract No. DE-AC02-76SF00515.

Contact

Any questions? Any suggestions? Want to collaborate? Feel free to raise an issue on Github or email Sam Young youngsam@stanford.edu.

About

Repository for "Panda: Self-distillation of Reusable Sensor-level Representations for High Energy Physics"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published