A vector quantized masked autoencoder for audiovisual speech emotion recognition

This repository contains the code associated with the following publication:

A vector quantized masked autoencoder for audiovisual emotion recognition
Samir Sadok, Simon Leglaive, Renaud Séguier (2025)
Computer Vision and Image Understanding, 104362.

If you use this code for your research, please cite the above paper.

Useful links:

Setup

Pypi: (Soon)

Install the package locally (for use on your system):
- In VQ-MAE-speech directoy: pip install -e .
Virtual Environment:
- conda create -n vq_mae_av python=3.8
- conda activate vq_mae_av
- In VQ-MAE-speech directoy: pip install -r requirements.txt

Usage

To do:
- Training VQ-VAE-Audio
- Training VQ-VAE-Visual
- Training VQ-MAE-AV
- Fine-tuning and classification for emotion recognition

1) Training VQ-VAE-Audio in unsupervised learning

See the code train_speech_vqvae.py

You can download our pre-trained audio VQ-VAE following link (released soon).

2) Training VQ-VAE-Visual in unsupervised learning

See the code train_visual_vqvae.py

You can download our pre-trained visual VQ-VAE following link (released soon).

2) Training VQ-MAE-AV in self-supervised learning

See the code train_vq_mae_av.py

Pretrained models (released soon)

Model	Encoder depth
VQ-MAE-AV	6 - 12 - 16 - 20

3) Fine-tuning and classification for emotion recognition task

cross-validation | Speaker independent Follow the file "classification_speaker_independent.py".
80%/20% | Speaker dependent Follow the file "classification_speaker_dependent.py".

4) Graphical interface for VQ-MAE-AV

User Interface for VQ-MAE-AV:
(1) Click the "Config file" button to open a dialog box for selecting model parameters. Load the model by clicking the "VQ-MAE-AV" button. (2) Use the "Load data" button to open a dialog box for downloading a video to your computer. The display in (3) opens automatically. (7) Click the "Run" button, and the displays (4) (showing masked data) and (5) (illustrating reconstruction using our model) will open. (6) Use the slider to adjust the masking percentage. (8) Click the "Save" button to save all the data (input, masked data, and reconstruction) to your desired folder path. 201

See the code test_interface.py


## License
GNU Affero General Public License (version 3), see LICENSE.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config_cluster		config_cluster
config_cmdmae		config_cmdmae
config_mae		config_mae
config_speech_vqvae		config_speech_vqvae
config_vqvae		config_vqvae
gui/qualititave/id_1		gui/qualititave/id_1
images		images
src		src
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
classifier.py		classifier.py
loso.py		loso.py
preprocess_dataset.py		preprocess_dataset.py
test_interface.py		test_interface.py
train_speech_vqvae.py		train_speech_vqvae.py
train_visual_vqvae.py		train_visual_vqvae.py
train_vq_mae_av.py		train_vq_mae_av.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A vector quantized masked autoencoder for audiovisual speech emotion recognition

Setup

Usage

1) Training VQ-VAE-Audio in unsupervised learning

2) Training VQ-VAE-Visual in unsupervised learning

2) Training VQ-MAE-AV in self-supervised learning

3) Fine-tuning and classification for emotion recognition task

4) Graphical interface for VQ-MAE-AV

About

Uh oh!

Releases

Packages

Languages

samsad35/VQ-MAE-AudioVisual-code

Folders and files

Latest commit

History

Repository files navigation

A vector quantized masked autoencoder for audiovisual speech emotion recognition

Setup

Usage

1) Training VQ-VAE-Audio in unsupervised learning

2) Training VQ-VAE-Visual in unsupervised learning

2) Training VQ-MAE-AV in self-supervised learning

3) Fine-tuning and classification for emotion recognition task

4) Graphical interface for VQ-MAE-AV

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages