Attention-Based Source Device Identification Using Audio Content from Videos and Grad-CAM Explanations

This repository implements source device identification using log-mel spectrograms of audios extracted from videos. It leverages CBAM architecture and supports training on the VISION dataset, with robust transfer learning and evaluation on the FloreView and POLIPHONE datasets. Additionally, it includes Grad-CAM visualizations to interpret model decisions.

Project Structure

Folder	Description
`ablation_study/`	Ablation Study on VISION dataset (channel, spatial, & both modules)
`audio_extraction/`	Extract audio from videos using FFmpeg
`dataset/`	VISION download
`dataset_preperation/`	Create disjoint Train, Validation, and Test Splits
`spectrogram_generation/`	Convert audio to log-Mel spectrograms
`spectrogram_segmentation/`	Create log-Mel Spectrogram Segments
`training_vision/`	Train CBAM-ResNet models on VISION dataset (standard & merged)
`transfer_learning/`	Transfer Learning of VISION models on FloreView and POLIPHONE datasets

Requirements

Create a requirements.txt with:

tensorflow
librosa
scikit-learn
matplotlib
seaborn
numpy
opencv-python

Also ensure ffmpeg is installed on your system for audio extraction.

python -m venv cbam-resnet && sourcecbam-resnet/bin/activate && pip install -r requirements.txt

Pipeline Instructions

1. VISION and FloreView Download - (NOTE: Download POLIPHONE manually)

You can download the VISION and the FloreView dataset using the script

python dataset/download_datasets.py

2. Extract Audio from Videos

python audio_extraction/video2audio.py

3. Generate log-Mel Spectrograms

# For VISION dataset
python spectrogram_generation/VISION_mel.py

# For FloreView dataset
python spectrogram_generation/Floreview_Flat_mel.py

# For POLIPHONE dataset
python spectrogram_generation/POLIPHONE_mel.py

4. Create log-Mel Spectrograms Segments

# For VISION dataset
python spectrogram_segmentation/create_spectrogram_segments_vision.py

# For FloreView dataset
python spectrogram_segmentation/create_spectrogram_segments_floreview.py

# For POLIPHONE dataset
python spectrogram_segmentation/create_spectrogram_segments_poliphone.py

5. Create DISJOINT Train, Validation, and Test Splits (.npy files)

# VISION (Standard)
python dataset_preperation/create_spectrogram_dataset.py

# VISION (Merged)
python dataset_preperation/create_spectrogram_dataset_merged.py

# FloreView (Standard)
python dataset_preperation/create_floreview_spectrogram_dataset.py

# POLIPHONE (Standard)
python dataset_preperation/create_poliphone_spectrogram_dataset.py

6. Train and Test the CBAM-ResNet Model on VISION

# Standard CBAM-ResNet
python training_vision/train_test_model_cbam.py

# Merged CBAM-ResNet
python training_vision/train_test_model_merged_cbam.py

7. Ablation Study

# Ablation Study for VISION on Validation Set
python ablation_study/abl_cbam_resnet_patch.py

8. Transfer Learning on FloreView and POLIPHONE

# Transfer learning on FloreView
python transfer_learning/transfer_cbam_floreview.py

# Transfer learning on POLIPHONE
python transfer_learning/transfer_cbam_poliphone.py

Outputs

Each training script saves:

Best model weights (best_model.h5)
Confusion matrices (as PNG and CSV)
Accuracy trend plots
Class-wise performance evaluation (AUC, Classification metrics)
Grad-CAM Heatmaps for randomly selected log-Mel spectrogram patches for every device and scenario

Author

Christos Korgialas (ckorgial@csd.auth.gr)

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Attention-Based Source Device Identification Using Audio Content from Videos and Grad-CAM Explanations

Project Structure

Requirements

Pipeline Instructions

1. VISION and FloreView Download - (NOTE: Download POLIPHONE manually)

2. Extract Audio from Videos

3. Generate log-Mel Spectrograms

4. Create log-Mel Spectrograms Segments

5. Create DISJOINT Train, Validation, and Test Splits (.npy files)

6. Train and Test the CBAM-ResNet Model on VISION

7. Ablation Study

8. Transfer Learning on FloreView and POLIPHONE

Outputs

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
ablation_study		ablation_study
audio_extraction		audio_extraction
dataset		dataset
dataset_preperation		dataset_preperation
spectrogram_generation		spectrogram_generation
spectrogram_segmentation		spectrogram_segmentation
training_vision		training_vision
transfer_learning		transfer_learning
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

ckorgial/CBAM-ResNet

Folders and files

Latest commit

History

Repository files navigation

Attention-Based Source Device Identification Using Audio Content from Videos and Grad-CAM Explanations

Project Structure

Requirements

Pipeline Instructions

1. VISION and FloreView Download - (NOTE: Download POLIPHONE manually)

2. Extract Audio from Videos

3. Generate log-Mel Spectrograms

4. Create log-Mel Spectrograms Segments

5. Create DISJOINT Train, Validation, and Test Splits (.npy files)

6. Train and Test the CBAM-ResNet Model on VISION

7. Ablation Study

8. Transfer Learning on FloreView and POLIPHONE

Outputs

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages