Spotlight is an innovative AI system that automatically creates polished, narrated demo videos from just a URL and description. Perfect for product demos, tutorials, and automated documentation.
Spotlight transforms a simple web URL and task description into a professional demo video with:
- Automated browser interaction via AI agents
- Intelligent video editing with stillness detection and trimming
- AI-generated voiceover with voice cloning capabilities
- Dynamic background music from Suno AI
- Professional captions and subtitles
Here's a real example of Spotlight in action - watch how it automatically creates a polished demo video from a simple GitHub URL:
This demo shows Spotlight automatically navigating GitHub, signing in, and creating a new repository with an informative voiceover with voice cloning and background music.
The diagram below shows our complete AI-powered pipeline from URL input to final video output:
The system follows a sophisticated pipeline:
- Browser Agent - Uses Anthropic Claude with browser automation to interact with websites
- Screen Recording - Captures full desktop activity during automation
- Stillness Detection - Identifies and removes boring static sections
- Video Trimming - Creates tight, engaging cuts with timestamp remapping
- Transcript Generation - AI creates natural voiceover scripts from agent actions
- Voice Synthesis - ElevenLabs clones your voice or uses professional voices
- Audio Mixing - Combines voiceover with background music and original audio
- Final Assembly - Produces polished MP4 with burned-in subtitles
- Product Demos: Showcase your web application features
- Onboarding Videos: Create user guides automatically
- Documentation: Visual tutorials for complex workflows
- Marketing Content: Professional demo videos at scale
- Training Materials: Consistent, reproducible instructional content
- Python 3.11+
- Node.js 18+
- FFmpeg (for video processing)
- OpenCV (for computer vision)
- Anthropic API Key (for Claude AI agent)
- ElevenLabs API Key (for voice synthesis)
- Suno API Key (for background music generation)
git clone <repository-url>
cd hackmit_2025
# Setup Python environment
cd backend
pip install -r requirements.txt
# Configure environment variables
cp .env.example .env
# Edit .env with your API keys:
# ANTHROPIC_API_KEY=your_anthropic_key
# ELEVENLABS_API_KEY=your_elevenlabs_key
# SUNO_API_KEY=your_suno_keycd ../frontend
npm install
npm run devcd ../backend
uvicorn api:app --reload- Enter URL: Provide the website you want to demo
- Describe Task: Tell the AI what actions to perform
- Record Voice: Provide a short clip voice sample for cloning
- Generate: Watch the magic happen! Play Snake while you wait
# Basic github demo generation
python3 run_github_demo.pyModify settings in backend/trim_from_history.py:
still_min_seconds: Minimum duration to consider "still" (default: 2.5s)diff_threshold: Sensitivity for detecting motion (default: 1.2)frame_step: Frame sampling rate for analysis (default: 3)
Adjust in backend/api.py:
- Voice cloning vs. default voice:
USE_VOICEflag - Background music volume:
-18dBreduction - Audio ducking:
-15dBduring voiceover
Configure browser agent in backend/main.py:
max_actions_per_step: Actions per thinking cycle (default: 3)max_failures: Retry attempts (default: 3)max_steps: Total automation steps (default: 40)
hackmit_2025/
├── backend/ # Python backend services
│ ├── api.py # FastAPI server with streaming endpoints
│ ├── main.py # Core agent automation logic
│ ├── trim_from_history.py # Video editing and stillness detection
│ ├── fit_transcript.py # AI transcript generation and timing
│ ├── elevenlabs_tts.py # Voice synthesis integration
│ ├── audio_generation.py # Suno music generation
│ ├── screen_record.py # Cross-platform screen recording
│ └── artifacts/ # Generated videos, audio, logs
├── frontend/ # React TypeScript frontend
│ ├── src/
│ │ ├── App.tsx # Main application component
│ │ ├── components/ # UI components
│ │ │ ├── AudioRecorder.tsx # Voice recording widget
│ │ │ └── Snake.tsx # Entertainment during processing
│ │ └── styles.css # Application styling
│ └── public/ # Static assets
└── README.md # This file
- Multi-step reasoning: Claude-powered agent plans and executes complex workflows
- Error recovery: Automatic retry with failure analysis
- Context awareness: Understands page state and user intent
- Detect Still Frames: Detects static scenes for removal
- Timeline remapping: Maintains accurate timestamps after cuts
- Adaptive trimming: Preserves important action sequences
- Voice cloning: Personal voice synthesis from short samples
- Background scoring: AI-generated ambient music
- Audio mixing: Professional multi-layer composition
- Streaming API: Real-time progress updates
- Error handling: Graceful fallbacks and recovery
- Scalable: Designed for high-volume processing
# Upload custom voice
voice_id = upload_reference_voice("path/to/voice.wav", "My Voice")
# Use in synthesis
synthesize_to_file(voice_id, "Hello world", "output.wav")# Generate custom background music
music_path = generate_background_music(
"upbeat tech demo music",
target_duration_seconds=120
)# Manual video processing
jumpcut_video(
input_path="raw_recording.mp4",
output_path="trimmed.mp4",
mode="cut",
still_min_seconds=3.0
)Built for HackMIT 2025
Transform any website into a professional demo video with the power of AI!
