Skip to content

Convert 2D videos and photos into interactive 3D scenes using ML-SHARP and Rerun. Explore your videos in 3D space with depth maps, navigation tools, and creative effects.

License

Notifications You must be signed in to change notification settings

imclab/Apple-ml-sharp-rerun

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Apple ML-SHARP Rerun

Convert 2D videos and photos into interactive 3D scenes using ML-SHARP and Rerun. Explore your videos in 3D space with depth maps, navigation tools, and creative effects.

Demo Video

Demo Video Preview

Click here to download the full demo video | View thumbnail


๐ŸŽฏ For Non-Technical Users - Quick Start

New to coding? No problem! Follow these steps to turn your videos and photos into 3D scenes.

What You Need:

  1. A computer (Mac, Windows, or Linux)
  2. Python installed (download from python.org)
  3. Your video file (MP4, MOV, AVI) or photo (JPG, PNG)

Step-by-Step Guide:

Convert a Video to 3D (Recommended)

  1. Open Terminal (Mac/Linux) or Command Prompt (Windows)

  2. Navigate to this folder (where you downloaded this project)

    cd /path/to/Apple-ml-sharp-rerun
  3. Install required software (run these commands):

    pip install rerun-sdk numpy pillow opencv-python scipy torch tqdm
    pip install sharp
  4. Convert your video to 3D:

    python scripts/converters/video_to_3d_high_quality.py your_video.mp4 mps

    Replace your_video.mp4 with your actual video filename

  5. Wait for processing - This takes a few minutes depending on video length. The ML-SHARP model (~2.5GB) downloads automatically on first use.

  6. View your 3D scene:

    python scripts/visualizers/video_complete_viewer.py -i output_your_video/gaussians/

View an Existing 3D File (.ply)

If you already have a .ply file:

  1. Open Terminal/Command Prompt
  2. Navigate to this folder
  3. View the 3D file:
    python scripts/visualizers/visualize_with_rerun.py -i path/to/your/file.ply

Controls in the 3D Viewer:

  • Left Click + Drag: Rotate the view
  • Right Click + Drag: Pan/move the view
  • Scroll Wheel: Zoom in/out
  • Double Click: Reset view

Tips:

  • Start with short videos (10-30 seconds) for faster processing
  • Make sure your video has good lighting and clear objects
  • The first run downloads the ML-SHARP model (~2.5GB)

Need Help?

Check the Troubleshooting section below.


๐Ÿš€ Setup Guide

Prerequisites

  1. Python 3.8 or higher - Check with: python --version or python3 --version
  2. pip (Python package manager) - Usually comes with Python
  3. Git (optional, for cloning the repository)

Installation Steps

Step 1: Install Python Dependencies

pip install -r requirements.txt

Or install individually:

pip install rerun-sdk numpy pillow opencv-python scipy torch tqdm

Step 2: Install ML-SHARP

ML-SHARP (Sparse Hierarchical Attention-based Radiance Prediction) is Apple's model for converting 2D images/videos into 3D Gaussian Splatting scenes. Official repository: apple/ml-sharp

Option A: Install via pip (Recommended)
pip install sharp

The ML-SHARP model weights (~2.5GB) will be downloaded automatically when you run the video conversion script. The model is hosted at:

  • Model URL: https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt
Option B: Install from Source (GitHub)

If you want to install from the official GitHub repository:

  1. Clone the ML-SHARP repository:

    git clone https://github.com/apple/ml-sharp.git
    cd ml-sharp
  2. Create a conda environment (recommended by ML-SHARP):

    conda create -n sharp python=3.13
    conda activate sharp
  3. Install dependencies:

    pip install -r requirements.txt
  4. Install the package:

    pip install -e .

Note: The ML-SHARP Python package (sharp) provides:

  • sharp.models - Model definitions and predictor creation (PredictorParams, create_predictor)
  • sharp.utils.gaussians - Gaussian Splatting utilities (load_ply, save_ply, unproject_gaussians)
  • sharp.utils.io - Image I/O utilities
  • sharp.utils.color_space - Color space conversions

ML-SHARP CLI: You can also use the official ML-SHARP CLI:

# Convert images to 3D Gaussian Splats
sharp predict -i /path/to/input/images -o /path/to/output/gaussians

# Test installation
sharp --help

Step 3: Verify Installation

Test that everything is installed correctly:

python -c "import rerun; import numpy; import torch; import sharp; print('โœ“ All dependencies installed!')"

You should see: โœ“ All dependencies installed!

Step 4: Test with Sample Data

If you have sample .ply files in output_test/, try visualizing one:

python scripts/visualizers/visualize_with_rerun.py -i output_test/IMG_4707.ply --size 2.0

System Requirements

  • Operating System: macOS, Linux, or Windows
  • RAM: 8GB minimum, 16GB recommended
  • GPU: Optional but recommended for faster processing
    • CUDA (NVIDIA GPUs on Windows/Linux)
    • MPS (Apple Silicon Macs - M1, M2, M3, etc.)
    • CPU: Works but will be slower
  • Disk Space: At least 5GB free space for models and outputs

Device Selection

When converting videos, the script automatically detects the device:

  • CUDA: NVIDIA GPUs (fastest)
  • MPS: Apple Silicon GPUs (fast)
  • CPU: Fallback (slower)

You can also specify manually:

python scripts/converters/video_to_3d_high_quality.py video.mp4 cuda  # For NVIDIA GPU
python scripts/converters/video_to_3d_high_quality.py video.mp4 mps   # For Apple Silicon
python scripts/converters/video_to_3d_high_quality.py video.mp4 cpu   # For CPU only

๐Ÿ“– Usage Examples

1. Convert 2D Video to 3D

High Quality (Recommended):

python scripts/converters/video_to_3d_high_quality.py your_video.mp4 mps

This creates an output_your_video/ directory with:

  • frames/ - Extracted video frames (PNG)
  • gaussians/ - 3D Gaussian Splat files (PLY) - one per frame

Process Every Nth Frame (Faster):

# Process every 2nd frame (2x faster)
python scripts/converters/video_to_3d_high_quality.py video.mp4 mps 2

# Process every 5th frame (5x faster)
python scripts/converters/video_to_3d_high_quality.py video.mp4 mps 5

Standard Quality:

python scripts/converters/video_to_3d.py your_video.mp4

Quick Preview:

python scripts/converters/video_to_3d_simple.py your_video.mp4

2. View 3D Video/Scene

Single PLY File:

python scripts/visualizers/visualize_with_rerun.py -i output_test/IMG_4707.ply --size 2.0

Complete 3D Video Viewer: Shows original video, depth maps, 3D point cloud, and navigation data side by side:

python scripts/visualizers/video_complete_viewer.py \
    -i output_your_video/gaussians/ \
    --max-frames 30 \
    --size 2.0

Options:

  • -i, --input: Directory containing PLY files (gaussians folder)
  • --max-frames: Maximum frames to process (default: all)
  • --skip: Process every Nth frame (default: 1)
  • --resolution: Occupancy grid resolution in meters (default: 0.5)
  • --obstacle-height: Obstacle height threshold in meters (default: 0.5)
  • --size: Point size multiplier (default: 1.0)

Video Navigation Analysis:

python scripts/visualizers/video_navigation.py \
    -i output_your_video/gaussians/ \
    --max-frames 30

3. Build Navigation Map

Extract navigation data from 3D scenes:

python scripts/navigation/build_navigation_map.py \
    -i output_test/IMG_4707.ply \
    --resolution 0.5

With Path Planning:

python scripts/navigation/build_navigation_map.py \
    -i output_test/IMG_4707.ply \
    --resolution 0.5 \
    --plan-path \
    --start 0 0 \
    --goal 50 50 \
    -o navigation_map.json

4. Apply Effects

Depth-based Fog Effect:

python scripts/creative/apply_depth_effects.py -i scene.ply --effect fog

Create Camera Path (Orbit):

python scripts/creative/create_camera_path.py -i scene.ply --path orbit

๐Ÿ“ Project Structure

Apple-ml-sharp-rerun/
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ converters/            # 2D video to 3D conversion
โ”‚   โ”‚   โ”œโ”€โ”€ video_to_3d_high_quality.py  โญ Recommended
โ”‚   โ”‚   โ”œโ”€โ”€ video_to_3d.py
โ”‚   โ”‚   โ””โ”€โ”€ video_to_3d_simple.py
โ”‚   โ”œโ”€โ”€ visualizers/           # 3D visualization viewers
โ”‚   โ”‚   โ”œโ”€โ”€ video_complete_viewer.py     โญ Complete viewer with dual windows
โ”‚   โ”‚   โ”œโ”€โ”€ video_navigation.py
โ”‚   โ”‚   โ””โ”€โ”€ visualize_with_rerun.py
โ”‚   โ”œโ”€โ”€ navigation/            # Navigation & SLAM tools
โ”‚   โ”‚   โ”œโ”€โ”€ build_navigation_map.py
โ”‚   โ”‚   โ”œโ”€โ”€ extract_slam_data.py
โ”‚   โ”‚   โ””โ”€โ”€ demo_navigation.py
โ”‚   โ””โ”€โ”€ creative/              # Creative effects
โ”‚       โ”œโ”€โ”€ apply_depth_effects.py
โ”‚       โ”œโ”€โ”€ compose_3d_scenes.py
โ”‚       โ””โ”€โ”€ create_camera_path.py
โ”œโ”€โ”€ utils/                      # Reusable utility modules
โ”‚   โ”œโ”€โ”€ depth_rendering.py     # Depth map rendering
โ”‚   โ”œโ”€โ”€ frame_processing.py    # Frame processing
โ”‚   โ”œโ”€โ”€ navigation.py          # Navigation algorithms
โ”‚   โ”œโ”€โ”€ pathfinding.py         # Pathfinding
โ”‚   โ”œโ”€โ”€ visualization.py       # Viewer setup
โ”‚   โ”œโ”€โ”€ config.py              # Configuration
โ”‚   โ”œโ”€โ”€ io_utils.py            # File I/O
โ”‚   โ””โ”€โ”€ geometry.py            # 3D geometry
โ”œโ”€โ”€ examples/                   # Example scripts
โ”œโ”€โ”€ tests/                      # Test scripts
โ”œโ”€โ”€ configs/                    # Configuration files
โ””โ”€โ”€ data/                       # Sample data

๐ŸŽฎ 3D Viewer Controls

Rotate View:

  • Left click + drag

Pan/Move View:

  • Right click + drag (primary)
  • Middle mouse + drag
  • Shift + Left click + drag

Zoom:

  • Mouse wheel / Trackpad scroll

Reset View:

  • Double click anywhere

Tips:

  • Both 3D windows in the complete viewer work independently
  • Use the timeline at the bottom to scrub through video frames
  • You can pan, zoom, and rotate each window separately

๐Ÿ› ๏ธ Utility Modules

The utils/ module provides reusable components:

  • depth_rendering.py: Render depth maps from 3D points
  • frame_processing.py: Load and process PLY files
  • navigation.py: Ground detection, obstacle detection, occupancy grids
  • pathfinding.py: A* pathfinding algorithm
  • visualization.py: Set up Rerun viewers
  • config.py: Configuration classes
  • io_utils.py: File I/O helpers
  • geometry.py: 3D transformations

Usage Example:

from utils import (
    load_gaussian_data,
    render_depth_map,
    extract_ground_plane,
    setup_complete_viewer_blueprint
)

# Load PLY file
data = load_gaussian_data("scene.ply")

# Render depth map
depth_map, depth_colored = render_depth_map(
    data['positions'],
    data['colors'],
    resolution=(1280, 720)
)

See examples/ directory for complete examples.


๐Ÿ“Š Output Structure

When converting videos, the output structure is:

output_<video_name>/
โ”œโ”€โ”€ frames/          # Extracted video frames (PNG)
โ”‚   โ”œโ”€โ”€ frame_000000.png
โ”‚   โ”œโ”€โ”€ frame_000001.png
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ gaussians/       # 3D Gaussian Splat files (PLY)
โ”‚   โ”œโ”€โ”€ frame_000000.ply
โ”‚   โ”œโ”€โ”€ frame_000001.ply
โ”‚   โ””โ”€โ”€ ...
โ””โ”€โ”€ json/            # Metadata (optional)
    โ””โ”€โ”€ ...

๐Ÿ”ง Configuration

Default configurations are in utils/config.py:

  • ViewerConfig: Point sizes, opacity thresholds, rotations
  • NavigationConfig: Obstacle heights, grid resolution
  • DepthConfig: Depth rendering settings
  • ConversionConfig: Video conversion settings

You can customize these:

from utils import ViewerConfig, NavigationConfig

viewer_cfg = ViewerConfig(point_size_multiplier=2.0, opacity_threshold=0.2)
nav_cfg = NavigationConfig(obstacle_height=0.7, grid_resolution=0.3)

๐Ÿ†˜ Troubleshooting

Common Issues

Import Errors

Problem: ModuleNotFoundError: No module named 'rerun' or similar

Solution:

pip install -r requirements.txt
pip install sharp

ML-SHARP Not Found

Problem: ModuleNotFoundError: No module named 'sharp'

Solution:

pip install sharp

If that doesn't work, install from the official GitHub repository:

git clone https://github.com/apple/ml-sharp.git
cd ml-sharp
pip install -r requirements.txt
pip install -e .

Or use conda (recommended by ML-SHARP):

git clone https://github.com/apple/ml-sharp.git
cd ml-sharp
conda create -n sharp python=3.13
conda activate sharp
pip install -r requirements.txt
pip install -e .

Model Download Issues

Problem: Model download fails or is slow

Solution:

  • Check your internet connection
  • The model is ~2.5GB, ensure you have enough disk space
  • Model URL: https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt
  • You can manually download and place it in a cache directory

GPU Not Detected

Problem: CUDA/MPS errors or GPU not detected

Solution:

  • NVIDIA GPU (CUDA):
    • Install PyTorch with CUDA: pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
    • Check CUDA: python -c "import torch; print(torch.cuda.is_available())"
  • Apple Silicon (MPS):
    • MPS is automatically available on Apple Silicon Macs
    • Use mps device: python scripts/converters/video_to_3d_high_quality.py video.mp4 mps
  • CPU Fallback:
    • Use cpu: python scripts/converters/video_to_3d_high_quality.py video.mp4 cpu
    • Note: CPU is much slower

File Not Found

Problem: FileNotFoundError when running scripts

Solution:

  • Run scripts from the project root directory
  • Use absolute paths if relative paths don't work
  • Check that input files exist

Memory Errors

Problem: Out of memory during processing

Solution:

  • Process fewer frames: --max-frames 10
  • Use lower resolution videos
  • Close other applications
  • Use CPU instead of GPU if GPU memory is limited

Python Version Issues

Problem: Scripts don't work with your Python version

Solution:

  • Make sure you have Python 3.8 or higher: python --version
  • Use python3 instead of python if needed
  • Consider using a virtual environment

Rerun Version Mismatch - Blank Viewer

Problem: Rerun viewer is blank/nothing displays, with errors like:

  • "Rerun Viewer: v0.23.1 vs Rerun SDK: v0.27.0"
  • "dropping LogMsg due to failed decode"
  • "transport error"

Solution: This is caused by version mismatch. The viewer cannot decode messages from a newer SDK.

Option 1: Downgrade SDK to match viewer (Recommended)

pip install rerun-sdk==0.23.1

This matches your viewer version (v0.23.1) and should fix the blank viewer.

Option 2: Update viewer to match SDK Follow the error message to update the viewer to v0.27.0, or:

# Using cargo (if you have Rust installed)
cargo binstall --force rerun-cli@0.27.0

# Or download from: https://github.com/rerun-io/rerun/releases/0.27.0/

Note: The decode errors mean the viewer can't display data - this must be fixed for visualization to work.


๐Ÿ“ Notes

  • The ML-SHARP library (sharp) is required for 2D-to-3D video conversion
  • The ML-SHARP model weights (~2.5GB) download automatically on first use
  • Output directories are created automatically
  • PLY files should be in ML-SHARP Gaussian Splatting format
  • Rerun viewer runs independently - close the window to exit

๐Ÿ™ Credits & Acknowledgments

This project uses these open-source technologies:

Core Technologies

  • Rerun - Visualize Everything Fast

    • SDK for logging, storing, querying, and visualizing multimodal data
    • Built in Rust using egui
    • Licensed under Apache-2.0
    • Created by the team at rerun.io
    • GitHub | Documentation
  • Apple ML-SHARP - Sharp Monocular View Synthesis in Less Than a Second

    • Apple's model for converting 2D images/videos to 3D Gaussian Splatting scenes
    • Official GitHub repository: apple/ml-sharp
    • Project page: apple.github.io/ml-sharp
    • Research paper: arXiv:2512.10685
    • Model weights provided by Apple
    • Model hosted at: https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt
    • Installation: pip install sharp or install from source

Additional Dependencies

  • PyTorch - Deep learning framework
  • NumPy - Numerical computing
  • OpenCV - Computer vision library
  • Pillow - Image processing
  • SciPy - Scientific computing
  • tqdm - Progress bars

Special Thanks

  • Rerun Team (@rerun-io) for creating the visualization tool
  • Apple Research for developing ML-SHARP and making it available
  • All open-source contributors

๐Ÿ”— Related Links

ML-SHARP Resources

Rerun Resources

Documentation


๐Ÿ“„ License

This project is part of the Apple ML-SHARP ecosystem and visualization tools.

Third-party licenses:

  • Rerun: Apache-2.0 License
  • PyTorch: BSD-style License
  • Other dependencies: See their respective licenses

Made with โค๏ธ using Rerun and Apple ML-SHARP

About

Convert 2D videos and photos into interactive 3D scenes using ML-SHARP and Rerun. Explore your videos in 3D space with depth maps, navigation tools, and creative effects.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%