This a project for the "Multimodal information processing and analysis" lesson of the MSc Data Science program between the National Centre for Scientific Research "Demokritos" and the University of the Peloponnese. The goal is to classify a target lecture video into 3 different categories (boring, neutral, interesting) based on viewers stimulation. A sample collection of manually annotated videos used as training and evaluation dataset. The algorithm used for training is the SVM from sklearn library. For audio feature extraction pyAudioAnalysis library used. For video feature extraction VGG16 is used from Keras library.
- Python 3.7.6
- pip 20.0.2
- Supported video format:
.mp4 - Supported audio format:
.wav
- Clone the source:
git clone https://github.com/cjd1884/pyLectureMultiModalAnalysis.git- Install dependencies
pip install -r ./requirements.txtThe main file is lecture_classifier.py. It can be run in 3 different modes:
This is the primary evaluation mode used to assess SVM algorithm performance on training data. Accuracy is calculated using leave-one-video-out cross validation (different video is same as different speaker - we assume one video per speaker).
Video files to be used for training need to be placed in data/source/ folder. Format to be used is .mp4. Annotation file should be provided and placed under data/ folder named as index.csv. Index file should have the following format:
FILE;SEG;CLASS_1
video_0;part_0;boring
video_0;part_1;interesting
video_0;part_2;boring
video_0;part_3;interesting
video_1;part_0;interesting
video_1;part_1;neutral
video_1;part_2;interesting
video_1;part_3;boring
python lecture_classifier.py -a eval_trainIn this mode, the SVM model is trained on the entire training dataset and it is then saved to disk.
python lecture_classifier.py -a trainThe trained model (loaded from disk) is used to evaluate the target video provided for classification.
Target video to be annotated by the algorithm should be placed under data/target/ folder.
python lecture_classifier.py -a eval_target