Automatic viral clip generator for TikTok, YouTube Shorts and Kwai.
Transform long videos (podcasts, interviews, lives) into short, viral-ready clips with AI-powered moment detection, auto-generated hooks, and TikTok-style captions.
- Download videos directly from YouTube URLs
- AI-powered viral moment detection (heuristics + GPT-4.1)
- Automatic hook generation for the first 2-3 seconds
- Dynamic word-by-word captions (TikTok style)
- Multiple caption styles with color and scale effects
- Eye-catching hook overlays with customizable styles
- Vertical 9:16 output ready for upload
- LLM response caching to save costs
- Python 3.11+
- FFmpeg installed on your system
- OpenAI API key
# Clone the repository
git clone https://github.com/yourusername/clip.git
cd clip
# Install with uv (recommended)
uv sync
# Or with pip
pip install -e .Create a .env file in the project root:
OPENAI_API_KEY=sk-your-api-key-here# Create a new run from YouTube video
clip new "https://www.youtube.com/watch?v=VIDEO_ID" --clips 5
# Example output:
# URL detected: https://www.youtube.com/watch?v=VIDEO_ID
# Title: Video Title Here
# Duration: 45m 30s
# Channel: Channel Name
#
# Downloading video...
# Download ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%
#
# Run created: 20250127_143022_a3f8
# Video: Video Title Here
# Clips: 5
#
# To process: clip process 20250127_143022_a3f8# Create a new run from local video
clip new /path/to/video.mp4 --clips 5# Run the full pipeline
clip process <run_id>
# Run until a specific stage
clip process <run_id> --until segment# List all runs
clip list
# Show detailed status of a run
clip status <run_id>
# Show generated segments/clips
clip show <run_id>
# Delete a run
clip delete <run_id>
# Delete without confirmation
clip delete <run_id> --force
# Show version
clip version- ingest - Extract audio and metadata from video
- transcribe - Transcribe audio using OpenAI Whisper API
- segment - Identify viral moments using heuristics + GPT-4
- hooks - Generate hook text for each segment
- captions - Generate word-by-word captions (ASS format)
- render - Render final vertical clips with captions and hooks
Choose from multiple TikTok-style caption presets:
| Style | Description |
|---|---|
simple |
Basic white text, no uppercase |
bold |
Large white text with shadow, uppercase |
yellow |
Vibrant yellow with shadow |
fire |
Orange/red fire style |
rainbow |
3 words at a time, each with different color |
highlight_rainbow |
3 words with karaoke fill effect |
pop |
1 word at a time with pop animation (yellow) |
pop_rainbow |
1 word at a time, colorful with pop effect |
scale |
3 words, active word scales up (yellow) |
scale_rainbow |
3 words, colorful, active word scales up ✨ |
Recommended: scale_rainbow - Shows 3 colorful words at a time with the currently spoken word scaling up (140%) for maximum engagement.
Customize the hook overlay that appears in the first seconds:
| Style | Description |
|---|---|
simple |
Basic white text with black border |
bold |
Large yellow text with shadow |
neon |
White text with magenta neon border |
boxed |
Semi-transparent black box background |
fire |
Orange text with dark red border |
gradient_box |
Pink/magenta box background ✨ |
Features:
- Automatic text wrapping for long hooks (max 32 chars per line)
- Configurable duration (default: 2.5 seconds)
- Centered positioning at top of video
You can customize styles when running stages directly:
from pathlib import Path
from clip.storage.local import LocalStorage
from clip.stages.captions import CaptionsStage
from clip.stages.render import RenderStage
storage = LocalStorage(base_path=Path('.'))
# Generate captions with specific style
captions = CaptionsStage(storage)
captions.run('run_id', style='scale_rainbow')
# Render with default styles
render = RenderStage(storage)
render.run('run_id')The rainbow styles cycle through these colors:
- Magenta/Pink
- Yellow
- Green
- Cyan
- Blue
- Orange
Generated clips are saved in:
runs/<run_id>/output/
├── clip_seg_001.mp4
├── clip_seg_002.mp4
├── clip_seg_003.mp4
└── manifest.json
For a 1-hour video:
| Service | Usage | Cost |
|---|---|---|
| Whisper API | 60 min | ~$0.36 |
| GPT-4.1-mini | ~10k tokens | ~$0.02 |
| Total | ~$0.38 |
Costs are reduced on subsequent runs due to LLM response caching.
Thanks to yt-dlp, you can download from:
- YouTube (videos, shorts, playlists)
- TikTok
- Twitter/X
- Twitch
- And 1000+ more sites
clip/
├── src/clip/
│ ├── cli/ # Command-line interface
│ ├── core/ # Run management, pipeline orchestration
│ ├── stages/ # Pipeline stages (ingest, transcribe, etc.)
│ ├── llm/ # LLM client and caching
│ ├── storage/ # Storage backends (local, S3 future)
│ └── schemas/ # Pydantic models
├── runs/ # Generated runs and artifacts
├── pyproject.toml
└── README.md
MIT