Attention-Based Source Device Identification Using Audio Content from Videos and Grad-CAM Explanations
This repository implements source device identification using log-mel spectrograms of audios extracted from videos. It leverages CBAM architecture and supports training on the VISION dataset, with robust transfer learning and evaluation on the FloreView and POLIPHONE datasets. Additionally, it includes Grad-CAM visualizations to interpret model decisions.
| Folder | Description |
|---|---|
ablation_study/ |
Ablation Study on VISION dataset (channel, spatial, & both modules) |
audio_extraction/ |
Extract audio from videos using FFmpeg |
dataset/ |
VISION download |
dataset_preperation/ |
Create disjoint Train, Validation, and Test Splits |
spectrogram_generation/ |
Convert audio to log-Mel spectrograms |
spectrogram_segmentation/ |
Create log-Mel Spectrogram Segments |
training_vision/ |
Train CBAM-ResNet models on VISION dataset (standard & merged) |
transfer_learning/ |
Transfer Learning of VISION models on FloreView and POLIPHONE datasets |
Create a requirements.txt with:
tensorflow
librosa
scikit-learn
matplotlib
seaborn
numpy
opencv-pythonAlso ensure ffmpeg is installed on your system for audio extraction.
python -m venv cbam-resnet && sourcecbam-resnet/bin/activate && pip install -r requirements.txtYou can download the VISION and the FloreView dataset using the script
python dataset/download_datasets.pypython audio_extraction/video2audio.py# For VISION dataset
python spectrogram_generation/VISION_mel.py
# For FloreView dataset
python spectrogram_generation/Floreview_Flat_mel.py
# For POLIPHONE dataset
python spectrogram_generation/POLIPHONE_mel.py# For VISION dataset
python spectrogram_segmentation/create_spectrogram_segments_vision.py
# For FloreView dataset
python spectrogram_segmentation/create_spectrogram_segments_floreview.py
# For POLIPHONE dataset
python spectrogram_segmentation/create_spectrogram_segments_poliphone.py# VISION (Standard)
python dataset_preperation/create_spectrogram_dataset.py
# VISION (Merged)
python dataset_preperation/create_spectrogram_dataset_merged.py
# FloreView (Standard)
python dataset_preperation/create_floreview_spectrogram_dataset.py
# POLIPHONE (Standard)
python dataset_preperation/create_poliphone_spectrogram_dataset.py
# Standard CBAM-ResNet
python training_vision/train_test_model_cbam.py
# Merged CBAM-ResNet
python training_vision/train_test_model_merged_cbam.py# Ablation Study for VISION on Validation Set
python ablation_study/abl_cbam_resnet_patch.py# Transfer learning on FloreView
python transfer_learning/transfer_cbam_floreview.py
# Transfer learning on POLIPHONE
python transfer_learning/transfer_cbam_poliphone.pyEach training script saves:
- Best model weights (
best_model.h5) - Confusion matrices (as PNG and CSV)
- Accuracy trend plots
- Class-wise performance evaluation (AUC, Classification metrics)
- Grad-CAM Heatmaps for randomly selected log-Mel spectrogram patches for every device and scenario
Christos Korgialas (ckorgial@csd.auth.gr)
MIT License