Skip to content

Convert 2D videos to 3D VR format using AI depth estimation.

License

Notifications You must be signed in to change notification settings

Tok/depth-surge-3d

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Depth Surge 3D

CI codecov Python 3.9-3.12 License: MIT

Convert 2D videos to 3D VR format using AI depth estimation.

Depth Surge 3D transforms flat videos into stereoscopic 3D for VR headsets using Depth Anything V3 and Video-Depth-Anything V2 neural networks. It analyzes video frames with temporal consistency to predict depth, then generates left and right eye views for immersive stereoscopic viewing.

Key Features

  • Dual Depth Models:
    • Depth Anything V3 (default): 50% lower VRAM, faster processing, optimized for modern GPUs
    • Video-Depth-Anything V2: Superior temporal consistency with 32-frame sliding windows
  • AI Upscaling: Optional Real-ESRGAN enhancement (2x/4x) for higher output resolution
  • CUDA Hardware Acceleration: NVENC H.265 encoding and GPU-accelerated frame decoding
  • Configurable Depth Quality: Adjustable depth map resolution (518px to 4K) for quality vs. speed
  • Multiple VR Formats: Side-by-side and over-under stereoscopic formats
  • Flexible Resolutions: Square (VR-optimized), 16:9 (standard), cinema, and custom resolutions up to 8K
  • Resume Capability: Intelligent step-level resume for interrupted processing
  • Audio Preservation: Maintains original audio synchronization with lossless FLAC extraction
  • Web Interface: Modern browser-based UI with real-time progress tracking and live previews
  • Wide Format Support: Cinema, ultra-wide, and standard aspect ratios

Quick Start

Installation

git clone https://github.com/Tok/depth-surge-3d.git depth-surge-3d
cd depth-surge-3d
chmod +x setup.sh
./setup.sh

The setup script automatically installs all dependencies, downloads the Video-Depth-Anything model (~1.3GB), and verifies your system.

See Installation Guide for detailed setup instructions.

Usage

Web UI (Recommended):

./run_ui.sh
# Opens http://localhost:5000 in your browser

Command Line:

# Basic usage
python depth_surge_3d.py input_video.mp4

# Process specific time range with custom settings
python depth_surge_3d.py input_video.mp4 -s 01:30 -e 03:45 -f over_under --resolution 4k

UV Command Line

In WSL required to export a UV variable

export UV_LINK_MODE=copy
# Basic usage
uv run python depth_surge_3d.py input_video.mp4

Quick Start Script:

# Process a clip with optimized settings
./start.sh 1:11 2:22

See Usage Guide for comprehensive usage examples.

Requirements

  • Python 3.9, 3.10, 3.11, or 3.12 (Python 3.13+ not yet supported due to dependency limitations)
  • FFmpeg
  • CUDA 13.0+ (required for GPU acceleration)
  • CUDA-compatible GPU (optional but strongly recommended)

Documentation

User Guides:

Technical Documentation:

Development:

Output Structure

Each processing session creates a self-contained timestamped directory:

output/
└── timestamp_videoname_timestamp/
    ├── original_video.mp4      # Source video
    ├── original_audio.flac     # Pre-extracted audio
    ├── frames/                 # Extracted frames
    ├── vr_frames/             # Final VR frames
    └── videoname_3D_side_by_side.mp4  # Final 3D video

VR Viewing

Generated videos work with:

  • VR headsets (Meta Quest, HTC Vive, etc.)
  • Cardboard VR viewers
  • 3D video players supporting side-by-side or over-under formats

Performance

  • GPU Processing: ~2-4 seconds per output frame (RTX 4070+ class)
  • CPU Processing: ~30-60 seconds per output frame
  • Typical 1-minute clip: ~2-4 hours on modern GPU at 60fps output

Attribution

This project uses state-of-the-art depth estimation models:

Both models are based on vision transformer architectures optimized for monocular depth prediction.

License

MIT License - see LICENSE file for details.

Third-Party Components: Please review the Video-Depth-Anything license for terms before commercial use.

Quality Expectations, Parallax-Glitchwave and Z-Collapse Slopcore Aesthetics

This tool converts monocular video to pseudo-stereo using AI depth estimation. Results can be compelling for many types of content but will never match true stereo cameras or specialized VR equipment.

Best results with:

  • Clear depth variation (landscapes, interiors, people)
  • Good lighting and detail
  • Source resolution 1080p or higher
  • Steady camera movement

May struggle with:

  • Mirrors, glass, water reflections
  • Very dark or low-contrast scenes
  • Fast motion or rapid camera movements

Artifact Embracement: Expect algorithmic stereo divergence, synthetic depth layers, and monocular hallucinations. These AI-generated depth discontinuities create a unique aesthetic - depth-drift, disparity shimmer, and temporal judder may become part of the experience.

See the Troubleshooting Guide for detailed quality expectations and optimization tips.

Screenshots

Full Application Walkthrough (v0.9.1) - Complete web interface screenshot showing all processing steps (large file: 2.1 MB, 1371x8733px)