This repository contains the code associated with the following publication:
A vector quantized masked autoencoder for audiovisual emotion recognition
Samir Sadok, Simon Leglaive, Renaud Séguier (2025)
Computer Vision and Image Understanding, 104362.
If you use this code for your research, please cite the above paper.
Useful links:
- Pypi: (Soon)
- Install the package locally (for use on your system):
- In VQ-MAE-speech directoy:
pip install -e .
- In VQ-MAE-speech directoy:
- Virtual Environment:
conda create -n vq_mae_av python=3.8conda activate vq_mae_av- In VQ-MAE-speech directoy:
pip install -r requirements.txt
- To do:
- Training VQ-VAE-Audio
- Training VQ-VAE-Visual
- Training VQ-MAE-AV
- Fine-tuning and classification for emotion recognition
See the code train_speech_vqvae.py
- You can download our pre-trained audio VQ-VAE following link (released soon).
See the code train_visual_vqvae.py
- You can download our pre-trained visual VQ-VAE following link (released soon).
See the code train_vq_mae_av.py
- Pretrained models (released soon)
| Model | Encoder depth |
|---|---|
| VQ-MAE-AV | 6 - 12 - 16 - 20 |
- cross-validation | Speaker independent Follow the file "classification_speaker_independent.py".
- 80%/20% | Speaker dependent Follow the file "classification_speaker_dependent.py".
User Interface for VQ-MAE-AV:
(1) Click the "Config file" button to open a dialog box for selecting model parameters. Load the model by clicking the "VQ-MAE-AV" button. (2) Use the "Load data" button to open a dialog box for downloading a video to your computer. The display in (3) opens automatically. (7) Click the "Run" button, and the displays (4) (showing masked data) and (5) (illustrating reconstruction using our model) will open. (6) Use the slider to adjust the masking percentage. (8) Click the "Save" button to save all the data (input, masked data, and reconstruction) to your desired folder path. 201
See the code test_interface.py
## License
GNU Affero General Public License (version 3), see LICENSE.txt.