PLAYING FOR YOU: TEXT PROMPT-GUIDED JOINT AUDIO-VISUAL GENERATION FOR NARRATING FACES USING MULTI-ENTANGLED LATENT SPACE

Goal of the Model compared to SoTA

Please find the checkpoints for our model that can be loaded into the torch.load() function in train.py at the following Google-Drive Link:

Checkpoints

Example Generations

Source Image	Audio Profile	Prompt Text	Generated Output	Description
	Audio	IN THE DISTRICT COURT LITIGATION ULTIMATELY THERE WERE A NUMBER OF UNANSWERED QUESTIONS AS YOU KNOW A NUMBER OF GAPS THAT WE BELIEVE COULD BE FILLED BY THE GRAND JURY MATERIAL	Click to play video	Sample Generation of a Man
	Audio	OR PRODUCERS OR PROCESSORS OR ACCOUNTANTS OR AGRONOMISTS THE LIGHT BULB GOES ON I'VE SEEN IT OVER AND OVER AND PEOPLE WILL SAY WELL WE'RE DOING THIS	Click to play video	Sample Generation of an older man
	Audio	GRANTED THAT I'VE BEEN AROUND FOR A WHILE AND DECIDED THEY'LL GET ME IN NOVEMBER SO TO SPEAK BUT I DON'T THINK WE CAN REALLY GO BACK AND RELITIGATE THAT ASPECT OF IT	Click to play video	Sample Generation of another old man
	Audio	EVERYBODY THIS IS SENATOR MARSHA BLACKBURN FROM THE STATE OF TENNESSE AND I'M JUST SO EXCITED TO BE A PART OF THIS CELEBRATION	Click to play video	Sample Generation of a woman
	Audio	EVERYBODY THIS IS SENATOR MARSHA BLACKBURN FROM THE STATE OF TENNESSE AND I'M JUST SO EXCITED TO BE A PART OF THIS CELEBRATION	Click to play video	Generation of the same woman but with a degraded audio input, generated using a reduced bitrate, downsampling the audio, and adding distortion.
	Audio	SOMETIMES, THE BEST THING YOU CAN DO IS TO LET GO	Click to play video	Random child image from the internet, with an adult female voice profile as input.
	Audio	BE THE CHANGE YOU WANT TO SEE IN THE WORLD, BROTHERS AND SISTERS.	Click to play video	A Stable-Diffusion Generated Image of a child from the internet, with an adult male voice as input.
	Audio	I INVISCATE THE PAPER WITH GLUE TO CREATE MY ART PROJECT.	Click to play video	Voice profile of a child, with the face of an adult man.
	Audio	PEOPLE DO WHAT THEY HATE FOR MONEY, AND USE THE MONEY TO DO WHAT THEY LOVE.	Click to play video	Generation of an Indian Man.

News !!

The model checkpoints can be accessed in the GoogleDrive Link above.
Data Samples can be found at the following link: Data
The following packages must be installed into your environment:

  pip install -r requirements.txt

We will soon be publishing our model on Hugging Face 🤗

Files:

preprocess.py processes the multimodal inputs required for our model
The helper.py file has our transformer architectures and other helper functions
get_videos.py makes the train-test split and saves the outputs to the desired folder, clipping it to the desired length
The audio_model.py and video_model.py files have the definitions of our models, which are called in the train.py file
The assets folder has some example outputs, that can be viewed in the ReadME.md file of this repository

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PLAYING FOR YOU: TEXT PROMPT-GUIDED JOINT AUDIO-VISUAL GENERATION FOR NARRATING FACES USING MULTI-ENTANGLED LATENT SPACE

Goal of the Model compared to SoTA

Example Generations

News !!

Files:

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
ReadME.md		ReadME.md
audio_model.py		audio_model.py
dataset.py		dataset.py
get_videos.py		get_videos.py
helper.py		helper.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py
video_model.py		video_model.py

aashutoshav/NFY

Folders and files

Latest commit

History

Repository files navigation

PLAYING FOR YOU: TEXT PROMPT-GUIDED JOINT AUDIO-VISUAL GENERATION FOR NARRATING FACES USING MULTI-ENTANGLED LATENT SPACE

Goal of the Model compared to SoTA

Example Generations

News !!

Files:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages