Skip to content

Implementation of the CBAM-based algorithm for Source Device Identification (SDI) leveraging Grad-CAM explanations

License

Notifications You must be signed in to change notification settings

ckorgial/CBAM-ResNet

Repository files navigation

Attention-Based Source Device Identification Using Audio Content from Videos and Grad-CAM Explanations

This repository implements source device identification using log-mel spectrograms of audios extracted from videos. It leverages CBAM architecture and supports training on the VISION dataset, with robust transfer learning and evaluation on the FloreView and POLIPHONE datasets. Additionally, it includes Grad-CAM visualizations to interpret model decisions.


Project Structure

Folder Description
ablation_study/ Ablation Study on VISION dataset (channel, spatial, & both modules)
audio_extraction/ Extract audio from videos using FFmpeg
dataset/ VISION download
dataset_preperation/ Create disjoint Train, Validation, and Test Splits
spectrogram_generation/ Convert audio to log-Mel spectrograms
spectrogram_segmentation/ Create log-Mel Spectrogram Segments
training_vision/ Train CBAM-ResNet models on VISION dataset (standard & merged)
transfer_learning/ Transfer Learning of VISION models on FloreView and POLIPHONE datasets

Requirements

Create a requirements.txt with:

tensorflow
librosa
scikit-learn
matplotlib
seaborn
numpy
opencv-python

Also ensure ffmpeg is installed on your system for audio extraction.

python -m venv cbam-resnet && sourcecbam-resnet/bin/activate && pip install -r requirements.txt

Pipeline Instructions

1. VISION and FloreView Download - (NOTE: Download POLIPHONE manually)

You can download the VISION and the FloreView dataset using the script

python dataset/download_datasets.py

2. Extract Audio from Videos

python audio_extraction/video2audio.py

3. Generate log-Mel Spectrograms

# For VISION dataset
python spectrogram_generation/VISION_mel.py

# For FloreView dataset
python spectrogram_generation/Floreview_Flat_mel.py

# For POLIPHONE dataset
python spectrogram_generation/POLIPHONE_mel.py

4. Create log-Mel Spectrograms Segments

# For VISION dataset
python spectrogram_segmentation/create_spectrogram_segments_vision.py

# For FloreView dataset
python spectrogram_segmentation/create_spectrogram_segments_floreview.py

# For POLIPHONE dataset
python spectrogram_segmentation/create_spectrogram_segments_poliphone.py

5. Create DISJOINT Train, Validation, and Test Splits (.npy files)

# VISION (Standard)
python dataset_preperation/create_spectrogram_dataset.py

# VISION (Merged)
python dataset_preperation/create_spectrogram_dataset_merged.py

# FloreView (Standard)
python dataset_preperation/create_floreview_spectrogram_dataset.py

# POLIPHONE (Standard)
python dataset_preperation/create_poliphone_spectrogram_dataset.py

6. Train and Test the CBAM-ResNet Model on VISION

# Standard CBAM-ResNet
python training_vision/train_test_model_cbam.py

# Merged CBAM-ResNet
python training_vision/train_test_model_merged_cbam.py

7. Ablation Study

# Ablation Study for VISION on Validation Set
python ablation_study/abl_cbam_resnet_patch.py

8. Transfer Learning on FloreView and POLIPHONE

# Transfer learning on FloreView
python transfer_learning/transfer_cbam_floreview.py

# Transfer learning on POLIPHONE
python transfer_learning/transfer_cbam_poliphone.py

Outputs

Each training script saves:

  • Best model weights (best_model.h5)
  • Confusion matrices (as PNG and CSV)
  • Accuracy trend plots
  • Class-wise performance evaluation (AUC, Classification metrics)
  • Grad-CAM Heatmaps for randomly selected log-Mel spectrogram patches for every device and scenario


Author

Christos Korgialas (ckorgial@csd.auth.gr)


License

MIT License

About

Implementation of the CBAM-based algorithm for Source Device Identification (SDI) leveraging Grad-CAM explanations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages