CataractSAM-2: Enhancing Transferability and Real-Time Ophthalmic Surgery Segmentation Through Automated Ground-Truth Generation

We introduce CataractSAM‑2, a domain-adapted extension of SAM‑2 optimized for high-precision segmentation in cataract and related ophthalmic surgeries. To preserve generalizable visual priors, we freeze the SAM‑2 image encoder and fine-tune only the prompt encoder and mask decoder on the Cataract‑1K dataset. To address the time-consuming nature of manual frame-by-frame annotation, we develop a human-in-the-loop interactive annotation framework built on the SAM2VideoPredictor, significantly accelerating ground-truth generation.

Overview

CataractSAM‑2 Model
A fine-tuned, domain-adapted variant of Meta’s SAM‑2, trained specifically for ophthalmic surgery segmentation. It achieves 90–95% mean IoU and runs in real time at 15 FPS across surgical videos.
Interactive Ground-Truth Annotation Framework
A lightweight, point-guided annotation system leveraging the SAM2VideoPredictor. Users provide sparse point-based prompts, and the model propagates accurate masks through the video, cutting annotation time by over 80%.
Open-Source Toolkit
This repo includes:
- ✅ Pretrained weights (.pth)
- ✅ Interactive inference widgets
- ✅ Demo notebook

Backup Repository:

An additional repository, independent of Google Colab, is available: https://github.com/mohaEs/CataractSAM-2

Tutorial

Cataract-SAM2.Tutorial.Video.mp4

Load from 🤗 Hugging Face

We released our pretrain weight here

Installation

This project ships Meta's original SAM-2 repository as a git submodule under sam2/. Installing it in editable mode enables the exact CLI exposed by the upstream code. The environment requires Python 3.10+ and the packages listed in requirements.txt. The weight download script fetches the public SAM-2 checkpoint from the DhanvinG/Cataract-SAM2 repository on Hugging Face. It is stored in checkpoints/ and is needed before using the library. CataractSAM‑2 has been tested on Python 3.12, SAM‑2 v1.0, Jupyter Notebook 7.4.4, and CUDA 12.2. Follow these steps to get started:

Clone the repository

git clone --recurse-submodules https://github.com/DhanvinG/Cataract-SAM2.git
cd Cataract-SAM2
git submodule update --init --recursive

Create & activate a new virtual environment

python -m venv venv
# macOS/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate

Install SAM‑2 core in editable mode
```
 pip install -e ./segment_anything_2
```
Install CataractSAM‑2 in editable mode
```
 pip install -e .
```
Install Jupyter Notebook (for running the demo)
```
 pip install notebook
```

Download pretrained weights

 python examples/download_checkpoints.py

Warning

Restart your Python session or runtime to ensure imports work. This is required for Hydra and editable installs to be registered correctly

Quick start

Place your video frames as numbered JPEG files under the data directory (e.g. data/frames/000.jpg, 001.jpg, …). Then build the predictor directly:

from sam2.build_sam import build_sam2_video_predictor
pred = build_sam2_video_predictor(model_cfg, "checkpoints/Cataract-SAM2.pth", device="cuda")
setup(pred, "data")
Object(0, 1)  # start annotating object 1 on frame 0

Click positive/negative points to guide the model segmentation.

You can visualize intermediate masks by pressing the VISUALIZE button in the notebook UI.

from cataractsam2.ui_widget import Visualize
Visualize()

When satisfied with a single frame, propagate your objects through the sequence:

from cataractsam2.ui_widget import Propagate
Propagate(10)  #e.g. show every 10th frame for a quick check

Finally export masks for all frames and objects:

from cataractsam2 import Masks
Masks("./masks")  # one PNG per frame/object

Project structure

cataractsam2/ – library code wrapping SAM-2 and the widget interface.
examples/download_checkpoints.py – helper script to obtain SAM-2 weights from Hugging Face.
data/ – place your frame sequences here (example frames included).
notebooks/ – Contains an end-to-end demo notebook for using CataractSAM2 on video frames.

CataractSAM-2 builds upon Meta's Segment Anything Model 2. The code is licensed under the Apache License 2.0; see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
cataractsam2		cataractsam2
checkpoints		checkpoints
data		data
examples		examples
notebooks		notebooks
segment_anything_2 @ 2b90b9f		segment_anything_2 @ 2b90b9f
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CataractSAM-2: Enhancing Transferability and Real-Time Ophthalmic Surgery Segmentation Through Automated Ground-Truth Generation

Overview

Backup Repository:

Tutorial

Load from 🤗 Hugging Face

Installation

Quick start

Project structure

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

DhanvinG/CataractSAM-2

Folders and files

Latest commit

History

Repository files navigation

CataractSAM-2: Enhancing Transferability and Real-Time Ophthalmic Surgery Segmentation Through Automated Ground-Truth Generation

Overview

Backup Repository:

Tutorial

Load from 🤗 Hugging Face

Installation

Quick start

Project structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages