TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model

Official implementation of the WACV 2026 paper:
TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model

Project page: https://dfki-av.github.io/TalkingPose/

Overview

Diffusion models have recently advanced the realism and generalizability of character-driven animation, enabling high-quality motion synthesis from a single RGB image and driving poses. However, generating temporally coherent long-form content remains challenging: many existing methods are trained on short clips due to computational and memory constraints, limiting their ability to maintain consistency over extended sequences.

We propose TalkingPose, a diffusion-based framework designed for long-form, temporally consistent upper-body human animation. TalkingPose uses driving frames to capture expressive facial and hand motion and transfers them to a target identity through a Stable Diffusion backbone. To improve temporal consistency without additional training stages or computational overhead, we introduce a feedback-guided mechanism built upon image-based diffusion models. This design enables generation with unbounded duration. In addition, we introduce a large-scale dataset to support benchmarking for upper-body human animation.

Getting Started

Prerequisites

Python >= 3.10
CUDA 11.7

Installation

1. Clone the repository

git clone https://github.com/dfki-av/TalkingPose.git
cd TalkingPose

2. Create and activate a virtual environment

python -m venv tk_pose_venv
source tk_pose_venv/bin/activate

3. Install dependencies

pip install --index-url https://download.pytorch.org/whl/cu117 \
  torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio==2.0.2+cu117
pip install -r requirements.txt

4. Download pre-trained checkpoints

python tools/download_weights.py

Pose Extraction

Extract DWPose (required for training and inference)

python tools/extract_dwpose_from_vid.py \
  --video_root /path/to/mp4_videos \
  --save_dir /path/to/save_dwpose

Extract metadata for training

python tools/extract_meta_info.py \
  --video_root /path/to/videos \
  --dwpose_root /path/to/dwpose_output \
  --dataset_name <your_dataset_name> \
  --out_dir /path/to/output_meta_json

Training

After pose extraction and metadata generation, update the training configuration to specify:

metadata JSON paths
checkpoint paths
output directory

Then run:

python train.py --config configs/train/training.yaml

Inference

For self-identity animation, specify the checkpoint path and the video and pose directories in the configuration file.

Note: Video folders and their corresponding pose folders must share the same directory names.

python -m scripts.pose2vid --config configs/prompts/self_identity.yaml

Dataset

The dataset/ directory contains video_ids.csv, which lists the YouTube video IDs included in the TalkingPose dataset.

To download the videos, please use yt-dlp:
https://github.com/yt-dlp/yt-dlp

Temporal Jittering Error (TJE) Evaluation

To evaluate generated videos using the average temporal jittering error:

python tools/tje_error.py \
  --real_dir /path/to/real_videos \
  --gen_dir /path/to/generated_videos \
  --delta 2 \
  --out

Acknowledgements

This repository builds mainly upon and is inspired by the following works:

Moore-AnimateAnyone: https://github.com/MooreThreads/Moore-AnimateAnyone/tree/master
DWPose: https://github.com/IDEA-Research/DWPose This work has been partially supported by the EU projects CORTEX2 (GA No. 101070192) and LUMINOUS (GA No. 101135724).

Citation

If you find this work useful, please cite:

@article{javanmardi2025talkingpose,
  title={TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model},
  author={Javanmardi, Alireza and Jaiswal, Pragati and Habtegebrial, Tewodros Amberbir and Millerdurai, Christen and Wang, Shaoxiang and Pagani, Alain and Stricker, Didier},
  journal={arXiv preprint arXiv:2512.00909},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
configs		configs
dataset		dataset
scripts		scripts
src		src
tools		tools
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model

Overview

Getting Started

Prerequisites

Installation

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Download pre-trained checkpoints

Pose Extraction

Extract DWPose (required for training and inference)

Extract metadata for training

Training

Inference

Dataset

Temporal Jittering Error (TJE) Evaluation

Acknowledgements

Citation

To-do / Release Plan

About

Uh oh!

Releases

Packages

Languages

License

dfki-av/TalkingPose

Folders and files

Latest commit

History

Repository files navigation

TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model

Overview

Getting Started

Prerequisites

Installation

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Download pre-trained checkpoints

Pose Extraction

Extract DWPose (required for training and inference)

Extract metadata for training

Training

Inference

Dataset

Temporal Jittering Error (TJE) Evaluation

Acknowledgements

Citation

To-do / Release Plan

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages