Convert 2D videos to 3D VR format using AI depth estimation.
Depth Surge 3D transforms flat videos into stereoscopic 3D for VR headsets using Depth Anything V3 and Video-Depth-Anything V2 neural networks. It analyzes video frames with temporal consistency to predict depth, then generates left and right eye views for immersive stereoscopic viewing.
- Dual Depth Models:
- Depth Anything V3 (default): 50% lower VRAM, faster processing, optimized for modern GPUs
- Video-Depth-Anything V2: Superior temporal consistency with 32-frame sliding windows
- AI Upscaling: Optional Real-ESRGAN enhancement (2x/4x) for higher output resolution
- CUDA Hardware Acceleration: NVENC H.265 encoding and GPU-accelerated frame decoding
- Configurable Depth Quality: Adjustable depth map resolution (518px to 4K) for quality vs. speed
- Multiple VR Formats: Side-by-side and over-under stereoscopic formats
- Flexible Resolutions: Square (VR-optimized), 16:9 (standard), cinema, and custom resolutions up to 8K
- Resume Capability: Intelligent step-level resume for interrupted processing
- Audio Preservation: Maintains original audio synchronization with lossless FLAC extraction
- Web Interface: Modern browser-based UI with real-time progress tracking and live previews
- Wide Format Support: Cinema, ultra-wide, and standard aspect ratios
git clone https://github.com/Tok/depth-surge-3d.git depth-surge-3d
cd depth-surge-3d
chmod +x setup.sh
./setup.shThe setup script automatically installs all dependencies, downloads the Video-Depth-Anything model (~1.3GB), and verifies your system.
See Installation Guide for detailed setup instructions.
Web UI (Recommended):
./run_ui.sh
# Opens http://localhost:5000 in your browserCommand Line:
# Basic usage
python depth_surge_3d.py input_video.mp4
# Process specific time range with custom settings
python depth_surge_3d.py input_video.mp4 -s 01:30 -e 03:45 -f over_under --resolution 4kUV Command Line
In WSL required to export a UV variable
export UV_LINK_MODE=copy# Basic usage
uv run python depth_surge_3d.py input_video.mp4Quick Start Script:
# Process a clip with optimized settings
./start.sh 1:11 2:22See Usage Guide for comprehensive usage examples.
- Python 3.9, 3.10, 3.11, or 3.12 (Python 3.13+ not yet supported due to dependency limitations)
- FFmpeg
- CUDA 13.0+ (required for GPU acceleration)
- CUDA-compatible GPU (optional but strongly recommended)
User Guides:
- Installation Guide - Detailed setup instructions and troubleshooting
- Usage Guide - Complete usage examples and workflows
- Parameters Reference - All command-line options and settings explained
- VR Headset Compatibility - Specs and optimal settings for top 10 VR devices
- Performance Benchmarks - GPU benchmarks, VRAM usage, and optimization guide
- Troubleshooting - Common issues and performance tips
Technical Documentation:
- Architecture - Technical details and processing pipeline
- Contributing Guide - Development workflow and CI/CD setup
- Coding Standards - Code quality requirements and best practices
Development:
- Development Notes - Quick reference for development
- Project Roadmap - Planned features and improvements
Each processing session creates a self-contained timestamped directory:
output/
└── timestamp_videoname_timestamp/
├── original_video.mp4 # Source video
├── original_audio.flac # Pre-extracted audio
├── frames/ # Extracted frames
├── vr_frames/ # Final VR frames
└── videoname_3D_side_by_side.mp4 # Final 3D video
Generated videos work with:
- VR headsets (Meta Quest, HTC Vive, etc.)
- Cardboard VR viewers
- 3D video players supporting side-by-side or over-under formats
- GPU Processing: ~2-4 seconds per output frame (RTX 4070+ class)
- CPU Processing: ~30-60 seconds per output frame
- Typical 1-minute clip: ~2-4 hours on modern GPU at 60fps output
This project uses state-of-the-art depth estimation models:
- Depth Anything V3 - Default model with improved memory efficiency and performance
- Video-Depth-Anything V2 - Temporal-consistent depth estimation with 32-frame sliding windows
Both models are based on vision transformer architectures optimized for monocular depth prediction.
MIT License - see LICENSE file for details.
Third-Party Components: Please review the Video-Depth-Anything license for terms before commercial use.
This tool converts monocular video to pseudo-stereo using AI depth estimation. Results can be compelling for many types of content but will never match true stereo cameras or specialized VR equipment.
Best results with:
- Clear depth variation (landscapes, interiors, people)
- Good lighting and detail
- Source resolution 1080p or higher
- Steady camera movement
May struggle with:
- Mirrors, glass, water reflections
- Very dark or low-contrast scenes
- Fast motion or rapid camera movements
Artifact Embracement: Expect algorithmic stereo divergence, synthetic depth layers, and monocular hallucinations. These AI-generated depth discontinuities create a unique aesthetic - depth-drift, disparity shimmer, and temporal judder may become part of the experience.
See the Troubleshooting Guide for detailed quality expectations and optimization tips.
Full Application Walkthrough (v0.9.1) - Complete web interface screenshot showing all processing steps (large file: 2.1 MB, 1371x8733px)