Skip to content

aashutoshav/NFY

Repository files navigation

PLAYING FOR YOU: TEXT PROMPT-GUIDED JOINT AUDIO-VISUAL GENERATION FOR NARRATING FACES USING MULTI-ENTANGLED LATENT SPACE

Goal of the Model compared to SoTA

Goal of the Model

Please find the checkpoints for our model that can be loaded into the torch.load() function in train.py at the following Google-Drive Link:

Checkpoints

Example Generations

Source Image Audio Profile Prompt Text Generated Output Description
Audio IN THE DISTRICT COURT LITIGATION ULTIMATELY THERE WERE A NUMBER OF UNANSWERED QUESTIONS AS YOU KNOW A NUMBER OF GAPS THAT WE BELIEVE COULD BE FILLED BY THE GRAND JURY MATERIAL

Click to play video

Sample Generation of a Man
Audio OR PRODUCERS OR PROCESSORS OR ACCOUNTANTS OR AGRONOMISTS THE LIGHT BULB GOES ON I'VE SEEN IT OVER AND OVER AND PEOPLE WILL SAY WELL WE'RE DOING THIS

Click to play video

Sample Generation of an older man
Audio GRANTED THAT I'VE BEEN AROUND FOR A WHILE AND DECIDED THEY'LL GET ME IN NOVEMBER SO TO SPEAK BUT I DON'T THINK WE CAN REALLY GO BACK AND RELITIGATE THAT ASPECT OF IT

Click to play video

Sample Generation of another old man
Audio EVERYBODY THIS IS SENATOR MARSHA BLACKBURN FROM THE STATE OF TENNESSE AND I'M JUST SO EXCITED TO BE A PART OF THIS CELEBRATION

Click to play video

Sample Generation of a woman
Audio EVERYBODY THIS IS SENATOR MARSHA BLACKBURN FROM THE STATE OF TENNESSE AND I'M JUST SO EXCITED TO BE A PART OF THIS CELEBRATION

Click to play video

Generation of the same woman but with a degraded audio input, generated using a reduced bitrate, downsampling the audio, and adding distortion.
Audio SOMETIMES, THE BEST THING YOU CAN DO IS TO LET GO

Click to play video

Random child image from the internet, with an adult female voice profile as input.
Audio BE THE CHANGE YOU WANT TO SEE IN THE WORLD, BROTHERS AND SISTERS.

Click to play video

A Stable-Diffusion Generated Image of a child from the internet, with an adult male voice as input.
Audio I INVISCATE THE PAPER WITH GLUE TO CREATE MY ART PROJECT.

Click to play video

Voice profile of a child, with the face of an adult man.
Audio PEOPLE DO WHAT THEY HATE FOR MONEY, AND USE THE MONEY TO DO WHAT THEY LOVE.

Click to play video

Generation of an Indian Man.

News !!

  • The model checkpoints can be accessed in the GoogleDrive Link above.
  • Data Samples can be found at the following link: Data
  • The following packages must be installed into your environment:
  pip install -r requirements.txt
  • We will soon be publishing our model on Hugging Face 🤗

Files:

  • preprocess.py processes the multimodal inputs required for our model
  • The helper.py file has our transformer architectures and other helper functions
  • get_videos.py makes the train-test split and saves the outputs to the desired folder, clipping it to the desired length
  • The audio_model.py and video_model.py files have the definitions of our models, which are called in the train.py file
  • The assets folder has some example outputs, that can be viewed in the ReadME.md file of this repository

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages