End-to-end pipeline: Video (or Prompt) → Human Pose Extraction → Robot Motion Conversion
fullvideo.mp4
Demo Video
backflip.mp4Backflip |
new_jeans.mp4Dance Motion |
[Prompt] → Veo → [Video] → PromptHMR → [SMPL-X] → GMR → [Robot Motion]
video2robot/
├── video2robot/ # Main package
│ ├── config.py # Configuration management
│ ├── pipeline.py # (Optional) Python API pipeline
│ ├── cli.py # Console entrypoint for installation
│ ├── video/ # Video generation/processing
│ │ └── veo_client.py # Google Veo API
│ ├── pose/ # Pose extraction (PromptHMR wrapper)
│ │ └── extractor.py
│ └── robot/ # Robot conversion (GMR wrapper)
│ └── retargeter.py
│
├── scripts/ # CLI scripts
│ ├── run_pipeline.py # Full pipeline
│ ├── generate_video.py # Veo video generation
│ ├── extract_pose.py # Pose extraction
│ └── convert_to_robot.py # Robot conversion
│ └── visualize.py # Result visualization
│
├── configs/ # Configuration files
├── data/ # Data (gitignored)
│
└── third_party/ # External dependencies (submodules)
├── PromptHMR/ # Pose extraction model
└── GMR/ # Motion retargeting
This project requires two conda environments: gmr and phmr.
# Clone repo (with submodules)
git clone --recursive https://github.com/AIM-Intelligence/video2robot.git
cd video2robot
# Or initialize submodules after cloning
git submodule update --init --recursiveconda create -n gmr python=3.10 -y
conda activate gmr
pip install -e .For details, see GMR README.
For Blackwell GPU (sm_120) users:
conda create -n phmr python=3.11 -y
conda activate phmr
cd third_party/PromptHMR
bash scripts/install_blackwell.shFor other GPUs (Ampere, Hopper, etc.):
conda create -n phmr python=3.10 -y
conda activate phmr
cd third_party/PromptHMR
pip install -e .For details, see PromptHMR README.
Note: Scripts automatically switch to the appropriate conda environment (
gmrorphmr) as needed. Just ensure both environments are installed - no need to manually activate them.
# Full pipeline (action → robot motion) - BASE_PROMPT auto-applied
python scripts/run_pipeline.py --action "Action sequence:
The subject walks forward with four steps."
# Use Sora
python scripts/run_pipeline.py --action "..." --provider sora
# Start from existing video (video.mp4 → robot motion)
python scripts/run_pipeline.py --video /path/to/video.mp4
# Resume from existing project
python scripts/run_pipeline.py --project data/video_001
# Run individual steps
python scripts/generate_video.py --action "Action sequence: The subject walks forward."
python scripts/extract_pose.py --project data/video_001
python scripts/convert_to_robot.py --project data/video_001
# Visualization (auto env switching)
python scripts/visualize.py --project data/video_001
python scripts/visualize.py --project data/video_001 --pose
python scripts/visualize.py --project data/video_001 --robot# Run server (from video2robot root)
uvicorn web.app:app --host 0.0.0.0 --port 8000
# Access in browser
# http://localhost:8000Features:
- Prompt input → Video generation → Pose extraction → Robot conversion automatic pipeline
- Video upload support
- Veo/Sora model selection
- 3D visualization (viser)
- Video-3D synchronized playback
# Create .env file
cp .env.example .env
# Set API key
echo "GOOGLE_API_KEY=your-api-key" >> .env| Robot | ID | DOF |
|---|---|---|
| Unitree G1 | unitree_g1 |
29 |
| Unitree H1 | unitree_h1 |
19 |
| Booster T1 | booster_t1 |
23 |
See GMR README for full list
# robot_motion.pkl
{
"fps": 30.0,
"robot_type": "unitree_g1",
"num_frames": 240,
"root_pos": np.ndarray, # (N, 3)
"root_rot": np.ndarray, # (N, 4) quaternion xyzw
"dof_pos": np.ndarray, # (N, DOF)
}-
lastFrame(Start/End Frame Interpolation) - Veo 3.1 only- Start image + End image → Generate video smoothly connecting the two
- Useful for "Pose A → Pose B" robot motion videos
-
referenceImages(Reference Images) - Veo 3.1 only- Up to 3 reference images to maintain character/style
- Generate videos with specific character performing actions
This project builds upon the following excellent open source projects:
This project depends on third-party libraries with their own licenses:
Please review both licenses before use.
The core video2robot code is MIT-licensed, but using this repository end-to-end (including PromptHMR) inherits PromptHMR's Non-Commercial Scientific Research Only restriction. Commercial use requires obtaining appropriate permission from the PromptHMR authors.