Skip to content

AIM-Intelligence/video2robot

Repository files navigation

video2robot

End-to-end pipeline: Video (or Prompt) → Human Pose Extraction → Robot Motion Conversion

Demo

fullvideo.mp4

Demo Video

backflip.mp4

Backflip
new_jeans.mp4

Dance Motion

Pipeline

[Prompt] → Veo → [Video] → PromptHMR → [SMPL-X] → GMR → [Robot Motion]

Project Structure

video2robot/
├── video2robot/            # Main package
│   ├── config.py           # Configuration management
│   ├── pipeline.py         # (Optional) Python API pipeline
│   ├── cli.py              # Console entrypoint for installation
│   ├── video/              # Video generation/processing
│   │   └── veo_client.py   # Google Veo API
│   ├── pose/               # Pose extraction (PromptHMR wrapper)
│   │   └── extractor.py
│   └── robot/              # Robot conversion (GMR wrapper)
│       └── retargeter.py
│
├── scripts/                # CLI scripts
│   ├── run_pipeline.py     # Full pipeline
│   ├── generate_video.py   # Veo video generation
│   ├── extract_pose.py     # Pose extraction
│   └── convert_to_robot.py # Robot conversion
│   └── visualize.py        # Result visualization
│
├── configs/                # Configuration files
├── data/                   # Data (gitignored)
│
└── third_party/            # External dependencies (submodules)
    ├── PromptHMR/          # Pose extraction model
    └── GMR/                # Motion retargeting

Installation

This project requires two conda environments: gmr and phmr.

# Clone repo (with submodules)
git clone --recursive https://github.com/AIM-Intelligence/video2robot.git
cd video2robot

# Or initialize submodules after cloning
git submodule update --init --recursive

1. GMR Environment (Robot Retargeting)

conda create -n gmr python=3.10 -y
conda activate gmr
pip install -e .

For details, see GMR README.

2. PromptHMR Environment (Pose Extraction)

For Blackwell GPU (sm_120) users:

conda create -n phmr python=3.11 -y
conda activate phmr
cd third_party/PromptHMR
bash scripts/install_blackwell.sh

For other GPUs (Ampere, Hopper, etc.):

conda create -n phmr python=3.10 -y
conda activate phmr
cd third_party/PromptHMR
pip install -e .

For details, see PromptHMR README.

Usage

Note: Scripts automatically switch to the appropriate conda environment (gmr or phmr) as needed. Just ensure both environments are installed - no need to manually activate them.

# Full pipeline (action → robot motion) - BASE_PROMPT auto-applied
python scripts/run_pipeline.py --action "Action sequence:
The subject walks forward with four steps."

# Use Sora
python scripts/run_pipeline.py --action "..." --provider sora

# Start from existing video (video.mp4 → robot motion)
python scripts/run_pipeline.py --video /path/to/video.mp4

# Resume from existing project
python scripts/run_pipeline.py --project data/video_001

# Run individual steps
python scripts/generate_video.py --action "Action sequence: The subject walks forward."
python scripts/extract_pose.py --project data/video_001
python scripts/convert_to_robot.py --project data/video_001

# Visualization (auto env switching)
python scripts/visualize.py --project data/video_001
python scripts/visualize.py --project data/video_001 --pose
python scripts/visualize.py --project data/video_001 --robot

Web UI

# Run server (from video2robot root)
uvicorn web.app:app --host 0.0.0.0 --port 8000

# Access in browser
# http://localhost:8000

Features:

  • Prompt input → Video generation → Pose extraction → Robot conversion automatic pipeline
  • Video upload support
  • Veo/Sora model selection
  • 3D visualization (viser)
  • Video-3D synchronized playback

Environment Setup

# Create .env file
cp .env.example .env

# Set API key
echo "GOOGLE_API_KEY=your-api-key" >> .env

Supported Robots

Robot ID DOF
Unitree G1 unitree_g1 29
Unitree H1 unitree_h1 19
Booster T1 booster_t1 23

See GMR README for full list

Output Format

# robot_motion.pkl
{
    "fps": 30.0,
    "robot_type": "unitree_g1",
    "num_frames": 240,
    "root_pos": np.ndarray,    # (N, 3)
    "root_rot": np.ndarray,    # (N, 4) quaternion xyzw
    "dof_pos": np.ndarray,     # (N, DOF)
}

TODO

  • lastFrame (Start/End Frame Interpolation) - Veo 3.1 only

    • Start image + End image → Generate video smoothly connecting the two
    • Useful for "Pose A → Pose B" robot motion videos
  • referenceImages (Reference Images) - Veo 3.1 only

    • Up to 3 reference images to maintain character/style
    • Generate videos with specific character performing actions

Acknowledgements

This project builds upon the following excellent open source projects:

  • PromptHMR: 3D human mesh recovery from video
  • GMR: general motion retargeting framework

License

This project depends on third-party libraries with their own licenses:

  • GMR: MIT License
  • PromptHMR: Non-Commercial Scientific Research Use Only

Please review both licenses before use.

The core video2robot code is MIT-licensed, but using this repository end-to-end (including PromptHMR) inherits PromptHMR's Non-Commercial Scientific Research Only restriction. Commercial use requires obtaining appropriate permission from the PromptHMR authors.

About

End-to-end pipeline converting generative videos (Veo, Sora) to humanoid robot motions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published