Datasets, codes, and pretrained weights for “Masked Image Modeling for Generalizable Organelle Segmentation in Volume EM” (under review)
OrgMIM is a masked image modeling framework for organelle-specific representation learning from volumetric EM data. It replaces random masking with complementary strategies based on structural priors and reconstruction feedback. We further introduce IsoOrg-1K, an organelle-centric 3D EM dataset with 928 volumes (>120B voxels) for large-scale pretraining.
- 1. Pretraining Database: IsoOrg-1K
- 2. Downstream Segmentation Datasets
- 3. Environments
- 4. Organelle-specific Pretraining via OrgMIM
- 5. Downstream Finetuning
- 6. Visualization
- 7. Released Weights
- 8. Acknowledgements
We introduce IsoOrg-1K, a diverse organelle-specific dataset collected from OpenOrganelle. Detailed information is shown below. The full dataset (and the metadata) can be accessed here, and the precomputed membrane maps are available here.
Meanwhile, we are actively curating and integrating organelle datasets, and will continue to update this repository to support larger-scale pretraining in the future.

We conduct extensive experiments on six representative datasets with varying voxel resolutions and biological contexts. The processed and partitioned data can be downloaded from here.
The complete Conda environment has been packaged for direct use. You can download and unzip it from here.
The formalized description can be seen in 'preparation/MAM_details.png'.
First, install the Segment Anything package:
pip install git+https://github.com/facebookresearch/segment-anything.gitThen, load the SAM model and weights in Python:
from segment_anything import sam_model_registry, SamPredictor
# Available model types: "vit_h", "vit_l", "vit_b"
model_type = "vit_h"
checkpoint_path = "sam_vit_h_4b8939.pth"
# Download the checkpoint from the official GitHub:
# https://github.com/facebookresearch/segment-anything#model-checkpoints
sam = sam_model_registry[model_type](checkpoint=checkpoint_path)In addition to SAM, models from the DINO family can also provide relevant priors. However, according to our qualitative experiments (see Figures/pca.png), their performance on EM data is not yet on par with that of SAM.
# Load a single-channel TIFF image and convert it to 3-channel RGB
img = tiff[i, :, :]
img_rgb = np.stack([img] * 3, axis=0) # Shape: (3, H, W)
image = np.transpose(img_rgb, (1, 2, 0)) # Shape: (H, W, 3)
# Initialize SAM predictor and extract features
predictor = SamPredictor(sam)
predictor.set_image(image)
embedding = predictor.features
embedding = embedding.detach().cpu().numpy().squeeze() # Shape: (C, H, W)
# Compute pixel affinities from embeddings
affs = embeddings_to_affinities(embedding, delta_v=0.5, delta_d=1.5)
mam = np.minimum(affs[0], affs[1]) # Take element-wise min of first two channels
mam = mam[1:, 1:]
# Resize to desired shape
mam_resized = nearest_neighbor_resize(mam, (512, 512))
# Convert to uint8 format for saving or visualization
mam_uint8 = np.uint8(255 * mam_resize)| Function / Class | Defined In | Description |
|---|---|---|
embeddings_to_affinities |
preparation/mam_utils.py |
Converts pixel embeddings into affinity maps |
nearest_neighbor_resize |
preparation/mam_utils.py |
Resizes 2D arrays using nearest neighbor interpolation |
After downloading and preparing the pretraining dataset, OrgMIM pretraining can be launched using the following command:
python scripts/pretrain.py -c orgmimAll major experimental settings are specified in a unified configuration file (scripts/config/orgmim.yaml), including:
- Backbone architecture: ViT or CNN
- Model scale: small / base / large
- Training hyperparameters: masking ratio, etc.
Processed downstream datasets are available here. Notably, the input data are normalized by dividing pixel intensities by 255.0. The current implementation supports automatic downloading and loading of pretrained OrgMIM weights with different backbone architectures and model scales through a unified configuration file.
python scripts/finetune.py -c orgmimWe note that this repository does not provide task-specific training pipelines, but focuses on releasing pretrained weights with example code for network initialization.
ckpt_path_list = ['/***/***/orgmim_mae_b_learner.ckpt']
img_path = '/opt/data/.../input/image.tif'
att_path = '/opt/data/.../input/mam.tif'
save_dir = '/opt/data/.../output'
name_list = ['dual']
reconstruct_and_visualize(
learner=learner,
ckpt_paths=ckpt_path_list,
img=img,
att=mam,
device=device,
save_dir=save_dir,
name_list=name_list,
mask_ratio=0.75,
step=200000,
total_step=400000,
patch_size=16,
image_size=128,
alpha_t=1
)| Function / Class | Defined In | Description |
|---|---|---|
reconstruct_and_visualize |
legacy/orgmim_mae/visualize.py |
Load pretrained weights and reconstruct the masked input |
| Methods | Models | Download |
|---|---|---|
| MAE-based OrgMIM (Base) | orgmim_mae_b_learner.ckpt | Hugging Face |
| Spark-based OrgMIM (Base) | orgmim_spark_b_learner.ckpt | Hugging Face |
| MAE-based OrgMIM (Large) | orgmim_mae_l_learner.ckpt | Hugging Face |
| Spark-based OrgMIM (Large) | orgmim_spark_l_learner.ckpt | Hugging Face |
| MAE-based OrgMIM (Small) | orgmim_mae_s_learner.ckpt | Hugging Face |
| Spark-based OrgMIM (Small) | orgmim_spark_s_learner.ckpt | Hugging Face |
We sincerely thank all contributors and the providers of open-source datasets that supported this project, including:
If you have any questions or suggestions, feel free to contact us via email or by opening an issue.