CataractSAM-2: Enhancing Transferability and Real-Time Ophthalmic Surgery Segmentation Through Automated Ground-Truth Generation
We introduce CataractSAM‑2, a domain-adapted extension of SAM‑2 optimized for high-precision segmentation in cataract and related ophthalmic surgeries. To preserve generalizable visual priors, we freeze the SAM‑2 image encoder and fine-tune only the prompt encoder and mask decoder on the Cataract‑1K dataset. To address the time-consuming nature of manual frame-by-frame annotation, we develop a human-in-the-loop interactive annotation framework built on the SAM2VideoPredictor, significantly accelerating ground-truth generation.
-
CataractSAM‑2 Model
A fine-tuned, domain-adapted variant of Meta’s SAM‑2, trained specifically for ophthalmic surgery segmentation. It achieves 90–95% mean IoU and runs in real time at 15 FPS across surgical videos. -
Interactive Ground-Truth Annotation Framework
A lightweight, point-guided annotation system leveraging theSAM2VideoPredictor. Users provide sparse point-based prompts, and the model propagates accurate masks through the video, cutting annotation time by over 80%. -
Open-Source Toolkit
This repo includes:- ✅ Pretrained weights (
.pth) - ✅ Interactive inference widgets
- ✅ Demo notebook
- ✅ Pretrained weights (
An additional repository, independent of Google Colab, is available: https://github.com/mohaEs/CataractSAM-2
Cataract-SAM2.Tutorial.Video.mp4
We released our pretrain weight here
This project ships Meta's original SAM-2 repository as a git submodule under sam2/. Installing it in editable mode enables the exact CLI exposed by the upstream code. The environment requires Python 3.10+ and the packages listed in requirements.txt. The weight download script fetches the public SAM-2 checkpoint from the DhanvinG/Cataract-SAM2 repository on Hugging Face. It is stored in checkpoints/ and is needed before using the library. CataractSAM‑2 has been tested on Python 3.12, SAM‑2 v1.0, Jupyter Notebook 7.4.4, and CUDA 12.2. Follow these steps to get started:
- Clone the repository
git clone --recurse-submodules https://github.com/DhanvinG/Cataract-SAM2.git cd Cataract-SAM2 git submodule update --init --recursive - Create & activate a new virtual environment
python -m venv venv # macOS/Linux source venv/bin/activate # Windows venv\Scripts\activate
- Install SAM‑2 core in editable mode
pip install -e ./segment_anything_2
- Install CataractSAM‑2 in editable mode
pip install -e . - Install Jupyter Notebook (for running the demo)
pip install notebook
- Download pretrained weights
python examples/download_checkpoints.py
Warning
Restart your Python session or runtime to ensure imports work. This is required for Hydra and editable installs to be registered correctly
Place your video frames as numbered JPEG files under the data directory
(e.g. data/frames/000.jpg, 001.jpg, …). Then build the predictor directly:
from sam2.build_sam import build_sam2_video_predictor
pred = build_sam2_video_predictor(model_cfg, "checkpoints/Cataract-SAM2.pth", device="cuda")
setup(pred, "data")
Object(0, 1) # start annotating object 1 on frame 0Click positive/negative points to guide the model segmentation.
You can visualize intermediate masks by pressing the VISUALIZE button in the notebook UI.
from cataractsam2.ui_widget import Visualize
Visualize()When satisfied with a single frame, propagate your objects through the sequence:
from cataractsam2.ui_widget import Propagate
Propagate(10) #e.g. show every 10th frame for a quick checkFinally export masks for all frames and objects:
from cataractsam2 import Masks
Masks("./masks") # one PNG per frame/objectcataractsam2/– library code wrapping SAM-2 and the widget interface.examples/download_checkpoints.py– helper script to obtain SAM-2 weights from Hugging Face.data/– place your frame sequences here (example frames included).notebooks/– Contains an end-to-end demo notebook for using CataractSAM2 on video frames.
CataractSAM-2 builds upon Meta's Segment Anything Model 2. The code is
licensed under the Apache License 2.0; see the LICENSE file for details.