Skip to content

[CVIU] A vector quantized masked autoencoder for audiovisual speech emotion recognition

Notifications You must be signed in to change notification settings

samsad35/VQ-MAE-AudioVisual-code

Repository files navigation

A vector quantized masked autoencoder for audiovisual speech emotion recognition

Generic badge made-with-python Website shields.io

This repository contains the code associated with the following publication:

A vector quantized masked autoencoder for audiovisual emotion recognition
Samir Sadok, Simon Leglaive, Renaud Séguier (2025)
Computer Vision and Image Understanding, 104362.

If you use this code for your research, please cite the above paper.

Useful links:

Setup

  • Pypi: (Soon)
  • Install the package locally (for use on your system):
    • In VQ-MAE-speech directoy: pip install -e .
  • Virtual Environment:
    • conda create -n vq_mae_av python=3.8
    • conda activate vq_mae_av
    • In VQ-MAE-speech directoy: pip install -r requirements.txt

Usage

  • To do:
    • Training VQ-VAE-Audio
    • Training VQ-VAE-Visual
    • Training VQ-MAE-AV
    • Fine-tuning and classification for emotion recognition

1) Training VQ-VAE-Audio in unsupervised learning

VQ-VAE-audio

See the code train_speech_vqvae.py

  • You can download our pre-trained audio VQ-VAE following link (released soon).

2) Training VQ-VAE-Visual in unsupervised learning

VQ-VAE-visual

See the code train_visual_vqvae.py

  • You can download our pre-trained visual VQ-VAE following link (released soon).

2) Training VQ-MAE-AV in self-supervised learning

VQ-MAE

See the code train_vq_mae_av.py

  • Pretrained models (released soon)
Model Encoder depth
VQ-MAE-AV 6 - 12 - 16 - 20

3) Fine-tuning and classification for emotion recognition task

4) Graphical interface for VQ-MAE-AV

interface

User Interface for VQ-MAE-AV:
(1) Click the "Config file" button to open a dialog box for selecting model parameters. Load the model by clicking the "VQ-MAE-AV" button. (2) Use the "Load data" button to open a dialog box for downloading a video to your computer. The display in (3) opens automatically. (7) Click the "Run" button, and the displays (4) (showing masked data) and (5) (illustrating reconstruction using our model) will open. (6) Use the slider to adjust the masking percentage. (8) Click the "Save" button to save all the data (input, masked data, and reconstruction) to your desired folder path. 201

See the code test_interface.py


## License
GNU Affero General Public License (version 3), see LICENSE.txt.

About

[CVIU] A vector quantized masked autoencoder for audiovisual speech emotion recognition

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published