Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models
The project also contains an implementation of the paper Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models, which allows the generation of realistic 3D human motions for action classes that were never seen during the training phase.
Details about the implementation and the obtained results can be found in the docs folder.
In this paper, we address the challenge of generating realistic 3D human motions for action classes that were never seen during the training phase. Our approach involves decomposing complex actions into simpler movements, specifically those observed during training, by leveraging the knowledge of human motion contained in GPTs models. These simpler movements are then combined into a single, realistic animation using the properties of diffusion models. Our claim is that this decomposition and subsequent recombination of simple movements can synthesize an animation that accurately represents the complex input action. This method operates during the inference phase and can be integrated with any pre-trained diffusion model, enabling the synthesis of motion classes not present in the training data. We evaluate our method by dividing two benchmark human motion datasets into basic and complex actions, and then compare its performance against the state-of-the-art.
-
Create Conda virtual environment:
conda create -n MCD python=3.10.9 conda activate MCD -
Install PyTorch following the official istructions:
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia -
Clone this repository and install its requirements:
git clone https://github.com/divanoLetto/MCD conda install pytorch-lightning conda install conda-forge::hydra-core conda install conda-forge::colorlog conda install conda-forge::orjson conda install conda-forge::einops pip install smplx pip install trimesh conda install conda-forge::opencv conda install conda-forge::moviepy pip install git+https://github.com/openai/CLIP.git conda install matplotlib pip install pyrender conda install -c conda-forge transformers conda install conda-forge::openai conda install conda-forge::python-dotenv
Please follow the following istructions to obtain the deps folder with SMPL+H downloaded, and place deps in the MCD directory.
Download distilbert instructions
mkdir deps
cd deps/
git lfs install
git clone https://huggingface.co/distilbert-base-uncased
cd ..SMPL body model instructions
This is only useful if you want to use generate 3D human meshes like in the teaser. In this case, you also need a subset of the AMASS dataset (see instructions below).
Go to the MANO website, register and go to the Download tab.
- Click on "Models & Code" to download
mano_v1_2.zipand place it in the folderdeps/smplh/. - Click on "Extended SMPL+H model" to download
smplh.tar.xzand place it in the folderdeps/smplh/.
The next step is to extract the archives, merge the hands from mano_v1_2 into the Extended SMPL+H models, and remove any chumpy dependency.
All of this can be done using with the following commands. (I forked both scripts from this repo SMPLX repo, updated them to Python 3, merged them, and made it compatible with .npz files).
pip install scipy chumpy
bash prepare/smplh.shThis will create SMPLH_FEMALE.npz, SMPLH_MALE.npz, SMPLH_NEUTRAL.npz inside the deps/smplh folder.
The error ImportError: cannot import name 'bool' from 'numpy' may occur depending on the versions of numpy and chumpy, in this case try commenting out the row that throws the exception: from numpy import bool, int, float, complex, object, unicode, str, nan, inf.
Click to expand
The motions all come from the AMASS dataset. Please download all "SMPL-H G" motions from the AMASS website and place them in the folder datasets/motions/AMASS.
It should look like this:
datasets/motions/
└── AMASS
├── ACCAD
├── BioMotionLab_NTroje
├── BMLhandball
├── BMLmovi
├── CMU
├── DanceDB
├── DFaust_67
├── EKUT
├── Eyes_Japan_Dataset
├── HumanEva
├── KIT
├── MPI_HDM05
├── MPI_Limits
├── MPI_mosh
├── SFU
├── SSM_synced
├── TCD_handMocap
├── TotalCapture
└── Transitions_mocapEach file contains a "poses" field with 156 (52x3) parameters (1x3 for global orientation, 21x3 for the whole body, 15x3 for the right hand and 15x3 for the left hand).
Then, launch these commands:
python prepare/amasstools/fix_fps.py
python prepare/amasstools/smpl_mirroring.py
python prepare/amasstools/extract_joints.py
python prepare/amasstools/get_smplrifke.pyClick here for more information on these commands
The script will interpolate the SMPL pose parameters and translation to obtain a constant FPS (=20.0). It will also remove the hand pose parameters, as they are not captured for most AMASS sequences. The SMPL pose parameters now have 66 (22x3) parameters (1x3 for global orientation and 21x3 for full body). It will create and save all the files in the folder datasets/motions/AMASS_20.0_fps_nh.
This command will mirror SMPL pose parameters and translations, to enable data augmentation with SMPL (as done by the authors of HumanML3D with joint positions).
The mirrored motions will be saved in datasets/motions/AMASS_20.0_fps_nh/M and will have a structure similar than the enclosing folder.
The script extracts the joint positions from the SMPL pose parameters with the SMPL layer (24x3=72 parameters). It will save the joints in .npy format in this folder: datasets/motions/AMASS_20.0_fps_nh_smpljoints_neutral_nobetas.
This command will use the joints + SMPL pose parameters (in 6D format) to create a unified representation (205 features). Please see prepare/amasstools/smplrifke_feats.py for more details.
The dataset folder should look like this:
datasets/motions
├── AMASS
├── AMASS_20.0_fps_nh
├── AMASS_20.0_fps_nh_smpljoints_neutral_nobetas
└── AMASS_20.0_fps_nh_smplrifkeClick to expand
Next, run the following command to pre-compute the CLIP embeddings (ViT-B/32):
python -m prepare.embeddings
The folder should look like this:
datasets/annotations/humanml3d
├── annotations.json
├── splits
│ ├── all.txt
│ ├── test_tiny.txt
│ ├── test.txt
│ ├── train_tiny.txt
│ ├── train.txt
│ ├── val_tiny.txt
│ └── val.txt
└── text_embeddings
└── ViT-B
├── 32_index.json
├── 32.npy
└── 32_slice.npy
Click on this link and download from your web browser.
Then extract in pretrained_models/ dir.
Start the training process with:
python train.py trainer.max_epochs=10000 +split='complex/train' run_dir='outputs/mdm-smpl_splitcomplex_humanml3d/logs/checkpoints/'
It will save the outputs to the folder specified by the argument run_dir. The number of diffusion steps is defined by the argument trainer.max_epochs.
The default dataset is HumanML3D, it can be changed to KitML using the argument dataset=kitml. To resume a training it's possible to use the argument resume_dir and ckpt, where the first specify the base directory where the config of the model are saved and the latter specify the path of the .ckpt file.
There are two available splits: the first, complex, divides the dataset into simple actions for training and complex actions for inference, while the second, multiannotations, is the split commonly used in the literature.
To generate a single motion from a single text (Decomposition + MCD) or from a presaved decomposed submotions input (MCD only) run the following:
python generate.py submotions='inputs/basketball.txt' diffusion.weight=7 diffusion.mcd=False animations=True ckpt='logs/checkpoints/last.ckpt' run_dir='pretrained_models/mdm-smpl_clip_smplrifke_humanml3d'
submotionsspecify the input decomposed subactions file to merge for the generation.diffusion.weightspecify the composition weight of the subactions.diffusion.mcdif True the MCD method is used for the composition of submovements, otherwise STMC is applied.animationsif True the Smpl animations are rendered and saved
First we need to decompose the testset in submotions. You can skip this part if you use the pre-generated submotions already included.
Otherwise to be able to use an OpenAI model for this task insert a valid OPENAI_API_KEY in the .env file and run the following:
python decompose/decompose_text2submotions.py dataset="humanml3d" compose="MCD" only_not_generated=True
only_not_generatedto decompose only the ids that are not already saved in the output directory.datasetto pick between the datasets "humanml3d" and "kitml"composeto chose between MCD and STMC submotions parsers
To generate the testset from saved decomposed submotions to evaluation purposes run:
python generate_testset.py input_type='submotions' +only_not_generated=False +submotions_dir='datasets/annotations/humanml3d/splits/complex/submotions' ckpt='pretrained_models/mdm-smpl_splitcomplex_humanml3d/logs/checkpoints/last.ckpt' run_dir='pretrained_models/mdm-smpl_splitcomplex_humanml3d' +out_dir='pretrained_models/mdm-smpl_splitcomplex_humanml3d/generations_submotions' +split='complex/test' diffusion.mcd=True diffusion.weight=7 animations=False
only_not_generatedto generate only the ids that are not already saved in the output directory.diffusion.weightspecify the composition weight of the subactions.diffusion.mcdif True the MCD method is used for the composition of submovements, otherwise STMC is applied.+out_dirthe path where output .npy files are saved.animationsif True the smpl animations are rendered and saved.submotions_dirthe directory where the decomposed submotions are saved.
To evaluate the models, first generate the testset using the decided methods, then run:
python evaluate.py
It is possible to specify which experiments to compare by adding or removing elements from the input_types variable. The default experiment is the comparison between ground truth, text-conditioned generation, MCD generation, and STMC generation for the HumanMl3D dataset.
Please consider citing our work if you find it useful:
@misc{mandelli2024generationcomplex3dhuman,
title={Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models},
author={Lorenzo Mandelli and Stefano Berretti},
year={2024},
eprint={2409.11920},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.11920},
}
If you found this project interesting, please consider checking out the work by Petrovich et al. in their repository STMC. Much of the foundation for this project was inspired by their excellent work!
