Multimodal SAM-Adapter for Semantic Segmentation

🚨 This repository contains download links to dataset, code snippets, and trained deep models of our work "Multimodal SAM-Adapter for Semantic Segmentation" ,

by Iacopo Curti*, Pierluigi Zama Ramirez*, Alioscia Petrelli* , and Luigi Di Stefano*. * Equal Contribution

University of Bologna

🎬 Introduction

Semantic segmentation, an essential task for applications such as autonomous driving, medical imaging, and robotics, has advanced enormously with deep learning but still struggles in challenging conditions such as poor lighting and adverse weather. Multimodal approaches incorporating auxiliary sensor data (e.g., LiDAR, infrared) have emerged to address these limitations. This work introduces MM SAM-adapter, a novel framework that leverages the Segment Anything Model (SAM) for multimodal semantic segmentation. Our approach consists of an adapter network that injects multimodal fused features into SAM’s rich RGB features. This strategy allows the model to prioritize RGB information while selectively incorporating auxiliary data when beneficial. We conduct extensive experiments on multimodal benchmarks such as DeLiVER, FMB, and MUSES, where MM SAM-adapter achieves state-of-the-art performance. Additionally, we manually divide DeLIVER and FMB into RGB-easy and RGB-hard splits, to evaluate better the impact of employing synergistically auxiliary modalities. Our results demonstrate that MM SAM-adapter outperforms competitors in easy and challenging scenarios.

🗄️ Dataset

In our experiments, we employed three datasets featuring different modalities: DeLiVER, MUSES and FMB. Those datasets have been downloaded from the corresponding repositories. See datasets folder for further details.

📥 Pretrained Models

In order to download the SAM backbone that we have used during the paper it is needed to run the script:

python /segmentation/tools/SAM_checkpoint_convert.py

To use these weights, please follow these steps:

Create a folder named pretrained in the segmentation directory.
Copy the downloaded weights into the pretrained folder.

📝 Code

🛠️ Setup Instructions

Install MMSegmentation v0.20.2 following the instructions:

cd segmentation
# recommended environment: torch1.9 + cuda11.1
conda create -n MMSAMAD python==3.8
conda activate MMSAMAD
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.4.2 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install timm==0.4.12
pip install numpy==1.24.3
pip install yapf==0.40.1
pip install mmsegmentation==0.20.2
pip install tqdm
pip install "opencv-python<4.9.0"
pip install setuptools==59.5.0
pip install tensorboard
pip install scipy
pip install einops==0.8.0
cd ops
bash make.sh # compile deformable attention

You should also create a folder called data within the segmentation folder where a symbolic path to each dataset is created:

ln -s /path/to/DELIVER /data
ln -s /path/to/muses /data
ln -s /path/to/FMB /data

🚀 Train our network

To train MM SAM-Adapter + Segformer head on DeLiVER on a single node with 2 gpus run:

cd segmentation
bash dist_train.sh configs/DELIVER/Segformer_MMSAM_adapter_large_DELIVER_1024x1024_ss_RGBLIDAR.py 2

🚀 Evaluation of our network

To evaluate MM SAM-Adapter + Segformer head on DeLiVER test set on a single node with 1 gpu run:

cd segmentation
bash dist_test.sh configs/DELIVER/Segformer_MMSAM_adapter_large_DELIVER_1024x1024_ss_RGBLIDAR.py /path/to/checkpoint_file 1 --eval mIoU --show-dir visualization_directory --resize-dim 1024 1024 #resize-dim is (800,600) in case of FMB

To evaluate MM SAM-Adapter + Segformer head on DeLiVER test set in easy/hard split on a single node with 1 gpu run the command mentioned below. In order to do this kind of evaluation it is needed to change the test dict within the config file specifying the type of dataset needed, i.e. DELIVER_easy:

cd segmentation
bash dist_test.sh configs/DELIVER/Segformer_MMSAM_adapter_large_DELIVER_1024x1024_ss_RGBLIDAR_easy.py /path/to/checkpoint_file 1 --eval mIoU --show-dir visualization_directory --resize-dim 1024 1024 #resize-dim is (800,600) in case of FMB

🚀 Infer of MUSES test set

cd segmentation
bash dist_infer.sh configs/MUSES/Segformer_MMSAM_adapter_large_MUSES_1024x1024_ss_RGBLIDAR.py /path/to/checkpoint_file 1 --show-dir inference_directory --resize-dim 1080 1920 #resize-dim is (800,600) in case of FMB

💾 Checkpoints

Download pretrained checkpoints for MM SAM-Adapter from the links below and place them in a specific folder (i.e. work_dirs).

DELIVER

Method	Backbone	Crop Size	Input Modalities	mIoU	Download
MM SAM-Adapter	ConvNext-S	1024	RGB + LiDAR	57.14	ckpt
MM SAM-Adapter	ConvNext-S	1024	RGB + Depth	57.35	ckpt
MM SAM-Adapter	ConvNext-S	1024	RGB + Event	55.70	ckpt

FMB

Method	Backbone	Crop Size	Input Modalities	mIoU	Download
MM SAM-Adapter	ConvNext-S	800	RGB + Thermal	66.10	ckpt

MUSES

Method	Backbone	Crop Size	Input Modalities	mIoU	Download
MM SAM-Adapter	ConvNext-S	1024	RGB + LiDAR	81.07	ckpt
MM SAM-Adapter	ConvNext-S	1024	RGB + Event	79.92	ckpt

📊 Quantitative Results

In this section, we present the main quantitative results presented in our paper. The following two tables shows DeLiVER test set results in the RGB-Depth (RGB-D), RGB-LiDAR (RGB-L), and RGB-Event (RGB-E) setups (RIGHT) and FMB test set results in the RGB-Thermal (RGB-T) setup across different scenarios (LEFT).

The following table shows MUSES test results in the RGB-LiDAR (RGB-L) and RGB-Event (RGB-E) setups for different weather conditions.

🎨 Qualitative Results

In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.

✉️ Contacts

For questions, please send an email to iacopo.curti2@unibo.it, pierluigi.zama@unibo.it

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

🙏 Acknowledgements

We would like to extend our sincere appreciation to the authors of the following projects for making their code available, which we have utilized in our work:

We would like to thank the authors of DeLiVER, ViT-Adapter, MUSES for providing their code, which has been instrumental in our experiments.

We deeply appreciate the authors of the competing research papers for their helpful responses, and provision of model weights, which greatly aided accurate comparisons.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datasets		datasets
images		images
segmentation		segmentation
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal SAM-Adapter for Semantic Segmentation

📑 Table of Contents

🎬 Introduction

🗄️ Dataset

📥 Pretrained Models

📝 Code

🛠️ Setup Instructions

🚀 Train our network

🚀 Evaluation of our network

🚀 Infer of MUSES test set

💾 Checkpoints

📊 Quantitative Results

🎨 Qualitative Results

✉️ Contacts

License

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

baobaoluo/Multimodal-SAM-Adapter

Folders and files

Latest commit

History

Repository files navigation

Multimodal SAM-Adapter for Semantic Segmentation

📑 Table of Contents

🎬 Introduction

🗄️ Dataset

📥 Pretrained Models

📝 Code

🛠️ Setup Instructions

🚀 Train our network

🚀 Evaluation of our network

🚀 Infer of MUSES test set

💾 Checkpoints

📊 Quantitative Results

🎨 Qualitative Results

✉️ Contacts

License

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages