Skip to content

CataractSAM-2: Enhancing Transferability and Real-Time Ophthalmic Surgery Segmentation Through Automated Ground-Truth Generation

License

Notifications You must be signed in to change notification settings

DhanvinG/CataractSAM-2

Repository files navigation

CataractSAM-2: Enhancing Transferability and Real-Time Ophthalmic Surgery Segmentation Through Automated Ground-Truth Generation

We introduce CataractSAM‑2, a domain-adapted extension of SAM‑2 optimized for high-precision segmentation in cataract and related ophthalmic surgeries. To preserve generalizable visual priors, we freeze the SAM‑2 image encoder and fine-tune only the prompt encoder and mask decoder on the Cataract‑1K dataset. To address the time-consuming nature of manual frame-by-frame annotation, we develop a human-in-the-loop interactive annotation framework built on the SAM2VideoPredictor, significantly accelerating ground-truth generation.

Image

Overview

  • CataractSAM‑2 Model
    A fine-tuned, domain-adapted variant of Meta’s SAM‑2, trained specifically for ophthalmic surgery segmentation. It achieves 90–95% mean IoU and runs in real time at 15 FPS across surgical videos.

  • Interactive Ground-Truth Annotation Framework
    A lightweight, point-guided annotation system leveraging the SAM2VideoPredictor. Users provide sparse point-based prompts, and the model propagates accurate masks through the video, cutting annotation time by over 80%.

  • Open-Source Toolkit
    This repo includes:

    • ✅ Pretrained weights (.pth)
    • ✅ Interactive inference widgets
    • ✅ Demo notebook

Backup Repository:

An additional repository, independent of Google Colab, is available: https://github.com/mohaEs/CataractSAM-2

Tutorial

Cataract-SAM2.Tutorial.Video.mp4

Load from 🤗 Hugging Face

We released our pretrain weight here

Installation

This project ships Meta's original SAM-2 repository as a git submodule under sam2/. Installing it in editable mode enables the exact CLI exposed by the upstream code. The environment requires Python 3.10+ and the packages listed in requirements.txt. The weight download script fetches the public SAM-2 checkpoint from the DhanvinG/Cataract-SAM2 repository on Hugging Face. It is stored in checkpoints/ and is needed before using the library. CataractSAM‑2 has been tested on Python 3.12, SAM‑2 v1.0, Jupyter Notebook 7.4.4, and CUDA 12.2. Follow these steps to get started:

  1. Clone the repository
    git clone --recurse-submodules https://github.com/DhanvinG/Cataract-SAM2.git
    cd Cataract-SAM2
    git submodule update --init --recursive
    
  2. Create & activate a new virtual environment
    python -m venv venv
    # macOS/Linux
    source venv/bin/activate
    # Windows
    venv\Scripts\activate
    
  3. Install SAM‑2 core in editable mode
     pip install -e ./segment_anything_2
    
  4. Install CataractSAM‑2 in editable mode
     pip install -e .
    
  5. Install Jupyter Notebook (for running the demo)
     pip install notebook
    
  6. Download pretrained weights
     python examples/download_checkpoints.py
    

Warning

Restart your Python session or runtime to ensure imports work. This is required for Hydra and editable installs to be registered correctly


Quick start

Place your video frames as numbered JPEG files under the data directory (e.g. data/frames/000.jpg, 001.jpg, …). Then build the predictor directly:

from sam2.build_sam import build_sam2_video_predictor
pred = build_sam2_video_predictor(model_cfg, "checkpoints/Cataract-SAM2.pth", device="cuda")
setup(pred, "data")
Object(0, 1)  # start annotating object 1 on frame 0

Click positive/negative points to guide the model segmentation.

You can visualize intermediate masks by pressing the VISUALIZE button in the notebook UI.

from cataractsam2.ui_widget import Visualize
Visualize()

When satisfied with a single frame, propagate your objects through the sequence:

from cataractsam2.ui_widget import Propagate
Propagate(10)  #e.g. show every 10th frame for a quick check

Finally export masks for all frames and objects:

from cataractsam2 import Masks
Masks("./masks")  # one PNG per frame/object

Project structure

  • cataractsam2/ – library code wrapping SAM-2 and the widget interface.
  • examples/download_checkpoints.py – helper script to obtain SAM-2 weights from Hugging Face.
  • data/ – place your frame sequences here (example frames included).
  • notebooks/ – Contains an end-to-end demo notebook for using CataractSAM2 on video frames.

CataractSAM-2 builds upon Meta's Segment Anything Model 2. The code is licensed under the Apache License 2.0; see the LICENSE file for details.

About

CataractSAM-2: Enhancing Transferability and Real-Time Ophthalmic Surgery Segmentation Through Automated Ground-Truth Generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •