Convert 2D videos and photos into interactive 3D scenes using ML-SHARP and Rerun. Explore your videos in 3D space with depth maps, navigation tools, and creative effects.
Click here to download the full demo video | View thumbnail
New to coding? No problem! Follow these steps to turn your videos and photos into 3D scenes.
- A computer (Mac, Windows, or Linux)
- Python installed (download from python.org)
- Your video file (MP4, MOV, AVI) or photo (JPG, PNG)
-
Open Terminal (Mac/Linux) or Command Prompt (Windows)
-
Navigate to this folder (where you downloaded this project)
cd /path/to/Apple-ml-sharp-rerun -
Install required software (run these commands):
pip install rerun-sdk numpy pillow opencv-python scipy torch tqdm pip install sharp
-
Convert your video to 3D:
python scripts/converters/video_to_3d_high_quality.py your_video.mp4 mps
Replace
your_video.mp4with your actual video filename -
Wait for processing - This takes a few minutes depending on video length. The ML-SHARP model (~2.5GB) downloads automatically on first use.
-
View your 3D scene:
python scripts/visualizers/video_complete_viewer.py -i output_your_video/gaussians/
If you already have a .ply file:
- Open Terminal/Command Prompt
- Navigate to this folder
- View the 3D file:
python scripts/visualizers/visualize_with_rerun.py -i path/to/your/file.ply
- Left Click + Drag: Rotate the view
- Right Click + Drag: Pan/move the view
- Scroll Wheel: Zoom in/out
- Double Click: Reset view
- Start with short videos (10-30 seconds) for faster processing
- Make sure your video has good lighting and clear objects
- The first run downloads the ML-SHARP model (~2.5GB)
Check the Troubleshooting section below.
- Python 3.8 or higher - Check with:
python --versionorpython3 --version - pip (Python package manager) - Usually comes with Python
- Git (optional, for cloning the repository)
pip install -r requirements.txtOr install individually:
pip install rerun-sdk numpy pillow opencv-python scipy torch tqdmML-SHARP (Sparse Hierarchical Attention-based Radiance Prediction) is Apple's model for converting 2D images/videos into 3D Gaussian Splatting scenes. Official repository: apple/ml-sharp
pip install sharpThe ML-SHARP model weights (~2.5GB) will be downloaded automatically when you run the video conversion script. The model is hosted at:
- Model URL:
https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt
If you want to install from the official GitHub repository:
-
Clone the ML-SHARP repository:
git clone https://github.com/apple/ml-sharp.git cd ml-sharp -
Create a conda environment (recommended by ML-SHARP):
conda create -n sharp python=3.13 conda activate sharp
-
Install dependencies:
pip install -r requirements.txt
-
Install the package:
pip install -e .
Note: The ML-SHARP Python package (sharp) provides:
sharp.models- Model definitions and predictor creation (PredictorParams,create_predictor)sharp.utils.gaussians- Gaussian Splatting utilities (load_ply,save_ply,unproject_gaussians)sharp.utils.io- Image I/O utilitiessharp.utils.color_space- Color space conversions
ML-SHARP CLI: You can also use the official ML-SHARP CLI:
# Convert images to 3D Gaussian Splats
sharp predict -i /path/to/input/images -o /path/to/output/gaussians
# Test installation
sharp --helpTest that everything is installed correctly:
python -c "import rerun; import numpy; import torch; import sharp; print('โ All dependencies installed!')"You should see: โ All dependencies installed!
If you have sample .ply files in output_test/, try visualizing one:
python scripts/visualizers/visualize_with_rerun.py -i output_test/IMG_4707.ply --size 2.0- Operating System: macOS, Linux, or Windows
- RAM: 8GB minimum, 16GB recommended
- GPU: Optional but recommended for faster processing
- CUDA (NVIDIA GPUs on Windows/Linux)
- MPS (Apple Silicon Macs - M1, M2, M3, etc.)
- CPU: Works but will be slower
- Disk Space: At least 5GB free space for models and outputs
When converting videos, the script automatically detects the device:
- CUDA: NVIDIA GPUs (fastest)
- MPS: Apple Silicon GPUs (fast)
- CPU: Fallback (slower)
You can also specify manually:
python scripts/converters/video_to_3d_high_quality.py video.mp4 cuda # For NVIDIA GPU
python scripts/converters/video_to_3d_high_quality.py video.mp4 mps # For Apple Silicon
python scripts/converters/video_to_3d_high_quality.py video.mp4 cpu # For CPU onlyHigh Quality (Recommended):
python scripts/converters/video_to_3d_high_quality.py your_video.mp4 mpsThis creates an output_your_video/ directory with:
frames/- Extracted video frames (PNG)gaussians/- 3D Gaussian Splat files (PLY) - one per frame
Process Every Nth Frame (Faster):
# Process every 2nd frame (2x faster)
python scripts/converters/video_to_3d_high_quality.py video.mp4 mps 2
# Process every 5th frame (5x faster)
python scripts/converters/video_to_3d_high_quality.py video.mp4 mps 5Standard Quality:
python scripts/converters/video_to_3d.py your_video.mp4Quick Preview:
python scripts/converters/video_to_3d_simple.py your_video.mp4Single PLY File:
python scripts/visualizers/visualize_with_rerun.py -i output_test/IMG_4707.ply --size 2.0Complete 3D Video Viewer: Shows original video, depth maps, 3D point cloud, and navigation data side by side:
python scripts/visualizers/video_complete_viewer.py \
-i output_your_video/gaussians/ \
--max-frames 30 \
--size 2.0Options:
-i, --input: Directory containing PLY files (gaussians folder)--max-frames: Maximum frames to process (default: all)--skip: Process every Nth frame (default: 1)--resolution: Occupancy grid resolution in meters (default: 0.5)--obstacle-height: Obstacle height threshold in meters (default: 0.5)--size: Point size multiplier (default: 1.0)
Video Navigation Analysis:
python scripts/visualizers/video_navigation.py \
-i output_your_video/gaussians/ \
--max-frames 30Extract navigation data from 3D scenes:
python scripts/navigation/build_navigation_map.py \
-i output_test/IMG_4707.ply \
--resolution 0.5With Path Planning:
python scripts/navigation/build_navigation_map.py \
-i output_test/IMG_4707.ply \
--resolution 0.5 \
--plan-path \
--start 0 0 \
--goal 50 50 \
-o navigation_map.jsonDepth-based Fog Effect:
python scripts/creative/apply_depth_effects.py -i scene.ply --effect fogCreate Camera Path (Orbit):
python scripts/creative/create_camera_path.py -i scene.ply --path orbitApple-ml-sharp-rerun/
โโโ scripts/
โ โโโ converters/ # 2D video to 3D conversion
โ โ โโโ video_to_3d_high_quality.py โญ Recommended
โ โ โโโ video_to_3d.py
โ โ โโโ video_to_3d_simple.py
โ โโโ visualizers/ # 3D visualization viewers
โ โ โโโ video_complete_viewer.py โญ Complete viewer with dual windows
โ โ โโโ video_navigation.py
โ โ โโโ visualize_with_rerun.py
โ โโโ navigation/ # Navigation & SLAM tools
โ โ โโโ build_navigation_map.py
โ โ โโโ extract_slam_data.py
โ โ โโโ demo_navigation.py
โ โโโ creative/ # Creative effects
โ โโโ apply_depth_effects.py
โ โโโ compose_3d_scenes.py
โ โโโ create_camera_path.py
โโโ utils/ # Reusable utility modules
โ โโโ depth_rendering.py # Depth map rendering
โ โโโ frame_processing.py # Frame processing
โ โโโ navigation.py # Navigation algorithms
โ โโโ pathfinding.py # Pathfinding
โ โโโ visualization.py # Viewer setup
โ โโโ config.py # Configuration
โ โโโ io_utils.py # File I/O
โ โโโ geometry.py # 3D geometry
โโโ examples/ # Example scripts
โโโ tests/ # Test scripts
โโโ configs/ # Configuration files
โโโ data/ # Sample data
Rotate View:
- Left click + drag
Pan/Move View:
- Right click + drag (primary)
- Middle mouse + drag
- Shift + Left click + drag
Zoom:
- Mouse wheel / Trackpad scroll
Reset View:
- Double click anywhere
Tips:
- Both 3D windows in the complete viewer work independently
- Use the timeline at the bottom to scrub through video frames
- You can pan, zoom, and rotate each window separately
The utils/ module provides reusable components:
depth_rendering.py: Render depth maps from 3D pointsframe_processing.py: Load and process PLY filesnavigation.py: Ground detection, obstacle detection, occupancy gridspathfinding.py: A* pathfinding algorithmvisualization.py: Set up Rerun viewersconfig.py: Configuration classesio_utils.py: File I/O helpersgeometry.py: 3D transformations
Usage Example:
from utils import (
load_gaussian_data,
render_depth_map,
extract_ground_plane,
setup_complete_viewer_blueprint
)
# Load PLY file
data = load_gaussian_data("scene.ply")
# Render depth map
depth_map, depth_colored = render_depth_map(
data['positions'],
data['colors'],
resolution=(1280, 720)
)See examples/ directory for complete examples.
When converting videos, the output structure is:
output_<video_name>/
โโโ frames/ # Extracted video frames (PNG)
โ โโโ frame_000000.png
โ โโโ frame_000001.png
โ โโโ ...
โโโ gaussians/ # 3D Gaussian Splat files (PLY)
โ โโโ frame_000000.ply
โ โโโ frame_000001.ply
โ โโโ ...
โโโ json/ # Metadata (optional)
โโโ ...
Default configurations are in utils/config.py:
ViewerConfig: Point sizes, opacity thresholds, rotationsNavigationConfig: Obstacle heights, grid resolutionDepthConfig: Depth rendering settingsConversionConfig: Video conversion settings
You can customize these:
from utils import ViewerConfig, NavigationConfig
viewer_cfg = ViewerConfig(point_size_multiplier=2.0, opacity_threshold=0.2)
nav_cfg = NavigationConfig(obstacle_height=0.7, grid_resolution=0.3)Problem: ModuleNotFoundError: No module named 'rerun' or similar
Solution:
pip install -r requirements.txt
pip install sharpProblem: ModuleNotFoundError: No module named 'sharp'
Solution:
pip install sharpIf that doesn't work, install from the official GitHub repository:
git clone https://github.com/apple/ml-sharp.git
cd ml-sharp
pip install -r requirements.txt
pip install -e .Or use conda (recommended by ML-SHARP):
git clone https://github.com/apple/ml-sharp.git
cd ml-sharp
conda create -n sharp python=3.13
conda activate sharp
pip install -r requirements.txt
pip install -e .Problem: Model download fails or is slow
Solution:
- Check your internet connection
- The model is ~2.5GB, ensure you have enough disk space
- Model URL:
https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt - You can manually download and place it in a cache directory
Problem: CUDA/MPS errors or GPU not detected
Solution:
- NVIDIA GPU (CUDA):
- Install PyTorch with CUDA:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 - Check CUDA:
python -c "import torch; print(torch.cuda.is_available())"
- Install PyTorch with CUDA:
- Apple Silicon (MPS):
- MPS is automatically available on Apple Silicon Macs
- Use
mpsdevice:python scripts/converters/video_to_3d_high_quality.py video.mp4 mps
- CPU Fallback:
- Use
cpu:python scripts/converters/video_to_3d_high_quality.py video.mp4 cpu - Note: CPU is much slower
- Use
Problem: FileNotFoundError when running scripts
Solution:
- Run scripts from the project root directory
- Use absolute paths if relative paths don't work
- Check that input files exist
Problem: Out of memory during processing
Solution:
- Process fewer frames:
--max-frames 10 - Use lower resolution videos
- Close other applications
- Use CPU instead of GPU if GPU memory is limited
Problem: Scripts don't work with your Python version
Solution:
- Make sure you have Python 3.8 or higher:
python --version - Use
python3instead ofpythonif needed - Consider using a virtual environment
Problem: Rerun viewer is blank/nothing displays, with errors like:
- "Rerun Viewer: v0.23.1 vs Rerun SDK: v0.27.0"
- "dropping LogMsg due to failed decode"
- "transport error"
Solution: This is caused by version mismatch. The viewer cannot decode messages from a newer SDK.
Option 1: Downgrade SDK to match viewer (Recommended)
pip install rerun-sdk==0.23.1This matches your viewer version (v0.23.1) and should fix the blank viewer.
Option 2: Update viewer to match SDK Follow the error message to update the viewer to v0.27.0, or:
# Using cargo (if you have Rust installed)
cargo binstall --force rerun-cli@0.27.0
# Or download from: https://github.com/rerun-io/rerun/releases/0.27.0/Note: The decode errors mean the viewer can't display data - this must be fixed for visualization to work.
- The ML-SHARP library (
sharp) is required for 2D-to-3D video conversion - The ML-SHARP model weights (~2.5GB) download automatically on first use
- Output directories are created automatically
- PLY files should be in ML-SHARP Gaussian Splatting format
- Rerun viewer runs independently - close the window to exit
This project uses these open-source technologies:
-
Rerun - Visualize Everything Fast
- SDK for logging, storing, querying, and visualizing multimodal data
- Built in Rust using egui
- Licensed under Apache-2.0
- Created by the team at rerun.io
- GitHub | Documentation
-
Apple ML-SHARP - Sharp Monocular View Synthesis in Less Than a Second
- Apple's model for converting 2D images/videos to 3D Gaussian Splatting scenes
- Official GitHub repository: apple/ml-sharp
- Project page: apple.github.io/ml-sharp
- Research paper: arXiv:2512.10685
- Model weights provided by Apple
- Model hosted at:
https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt - Installation:
pip install sharpor install from source
- PyTorch - Deep learning framework
- NumPy - Numerical computing
- OpenCV - Computer vision library
- Pillow - Image processing
- SciPy - Scientific computing
- tqdm - Progress bars
- Rerun Team (@rerun-io) for creating the visualization tool
- Apple Research for developing ML-SHARP and making it available
- All open-source contributors
- ML-SHARP GitHub: https://github.com/apple/ml-sharp
- ML-SHARP Project Page: https://apple.github.io/ml-sharp/
- Research Paper: https://arxiv.org/abs/2512.10685
- ML-SHARP Model:
https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt - Installation:
pip install sharpor install from GitHub - ML-SHARP CLI:
sharp predict -i <input> -o <output>(see official docs)
- Rerun GitHub: https://github.com/rerun-io/rerun
- Rerun Documentation: https://www.rerun.io/docs
- Rerun Website: https://rerun.io
- Rerun Discord: Join for community support
- Quick Start Guide: See QUICKSTART.md
- Project Structure: See STRUCTURE.md (if available)
- Examples: See examples/README.md
This project is part of the Apple ML-SHARP ecosystem and visualization tools.
Third-party licenses:
- Rerun: Apache-2.0 License
- PyTorch: BSD-style License
- Other dependencies: See their respective licenses
Made with โค๏ธ using Rerun and Apple ML-SHARP
