This repository provides a minimal, clean, and extendable ML pipeline for 3D point cloud semantic segmentation using the PandaSet LiDAR dataset. It implements a complete workflow from dataset acquisition β preprocessing β model architecture β training stub.
The goal is to provide a working baseline that can later be upgraded to advanced 3D sparse convolution architectures (e.g., MinkowskiNet / SparseConv U-Net).
- Full dataset download via Python (Kaggle API)
- Custom PandaSet loader (point clouds + semantic labels)
- PointNet segmentation baseline implemented in PyTorch
- Minimal training loop (forward β loss β backward)
- Configurable number of sequences (use subsets for fast prototyping)
- Fully reproducible and dependency-light
pip install -r requirements.txtCreate a .env file in the project root:
KAGGLE_USERNAME=your_username
KAGGLE_KEY=your_key
Run:
python download_pandaset.pyThis script:
- Authenticates using your
.env - Downloads the full PandaSet dataset (~33GB) from Kaggle
- Unzips everything under
./pandaset/
PandaSet is released under CC-BY-NC-SA, and Kaggle enforces attribution automatically via the LICENSE file.
The baseline model is a PointNet segmentation network implemented from scratch:
- Input:
(N, 4)points β(x, y, z, intensity) - MLP + shared Conv1D layers
- Global feature aggregation
- Per-point classification head
This provides a simple and reliable foundation before migrating to:
- SparseConv U-Net
- MinkowskiNet
- KPConv
- PointNet++
- etc.
python pointnet_model.pyThis runs:
- Dataset loading
- Minimal data analysis
- One epoch of training (stub)
- Prints batch loss
Designed for rapid validation and code completeness rather than full training.
- PointNet on outdoor LiDAR: ~35β50% mIoU
- SparseConv U-Net (production target): ~60β75% mIoU
PointNet is intentionally chosen for simplicity and compatibility with the assessment constraints.
point_cloud_segmentation/
β
βββ download_pandaset.py # Kaggle dataset downloader
βββ pointnet_model.py # Dataset loader + PointNet + training stub
βββ MODEL_CHOICES.md # Architecture justification & notes
βββ requirements.txt # Dependencies
βββ .gitignore
PandaSet is provided under the CC-BY-NC-SA 4.0 license. Any commercial use must follow Scale.ai's licensing terms.
This repository contains no dataset files β users must download them via Kaggle using the provided script.