Extract video frames and transcripts into Obsidian-compatible markdown notes.
This package provides two CLI tools:
- ytcapture - Process YouTube videos (with transcripts)
- vidcapture - Process local video files
Watching a lecture, tutorial, or presentation? ytcapture and vidcapture turn any video into a searchable, skimmable markdown note with:
- Embedded frame images at regular intervals so you can see what's on screen
- Timestamped transcript segments aligned to each frame
- Obsidian-ready format with YAML frontmatter and
![[wikilink]]embeds - Smart deduplication that removes redundant frames (great for slide-based content)
No more scrubbing through hour-long videos to find that one slide. Your notes become a visual index of the entire video.
On macOS:
brew install ffmpeg yt-dlp# Clone the repository
git clone https://github.com/jdmonaco/ytcapture.git
cd ytcapture
# Install as a CLI tool with uv (recommended)
uv tool install -e .
# Or install with pip
pip install -e .# Basic usage - outputs to vault root (or current directory)
ytcapture "https://www.youtube.com/watch?v=VIDEO_ID"
# Multiple videos at once
ytcapture URL1 URL2 URL3
# Process an entire playlist (auto-expands)
ytcapture "https://www.youtube.com/playlist?list=PLAYLIST_ID"
# On macOS, just copy a YouTube URL (or playlist) and run without arguments
ytcapture
# Skip confirmation for large playlists (>10 videos)
ytcapture "https://www.youtube.com/playlist?list=PLAYLIST_ID" -y
# Specify output directory (vault-relative unless absolute path)
ytcapture URL -o my-notes/
# Adjust frame interval (default: 15 seconds)
ytcapture URL --interval 30
# Extract more frames with aggressive deduplication
ytcapture URL --interval 5 --dedup-threshold 0.80# Basic usage
vidcapture meeting.mp4
# Multiple files
vidcapture video1.mp4 video2.mkv -o notes/
# Fast mode for long videos (uses keyframe seeking, less accurate timestamps)
vidcapture long-workshop.mp4 --fast --interval 60
# JSON output for scripting
vidcapture video.mp4 --json./
├── images/
│ └── VIDEO_ID/
│ ├── frame-0000.jpg
│ ├── frame-0001.jpg
│ └── ...
├── transcripts/
│ └── raw-transcript-VIDEO_ID.json
└── Video Title (Channel Name) 20241120.md
Assets are organized by video ID to support multiple video captures in the same directory.
The generated markdown looks like this:
---
title: Understanding Neural Networks
source: https://www.youtube.com/watch?v=abc123
author:
- Deep Learning Channel
created: '2024-12-15'
published: '2024-11-20'
description: An introduction to neural networks and deep learning fundamentals...
tags:
- youtube
---
# Understanding Neural Networks
> An introduction to neural networks and deep learning fundamentals.
## 00:00:00
![[images/abc123/frame-0000.jpg]]
Welcome to this tutorial on neural networks. Today we'll cover the basics.
## 00:00:15
![[images/abc123/frame-0001.jpg]]
Let's start by understanding what a neuron is and how it processes information.Both tools use a shared config file at ~/.ytcapture.yml (auto-created on first run):
# Vault root directory - relative paths in --output are relative to this
vault: ~/Documents/Obsidian/Notes
# Default output directory (vault-relative)
# output: Inbox/VideoCaptures
# Frame extraction defaults
interval: 15 # Seconds between frames
frame_format: jpg # jpg or png
dedup_threshold: 0.85 # 0.0-1.0, higher = more aggressive
# ytcapture-specific
language: en
prefer_manual: false
keep_video: false
# vidcapture-specific
fast: false # Use fast keyframe seekingCLI options override config values. The --help output shows your current defaults from config.
Bash completion is available for both commands:
# Install completions
ytcapture completion bash --install
vidcapture completion bash --install
# Restart your shell or source your bashrc
source ~/.bashrcTab completion for -o/--output is vault-aware (completes directories relative to your vault).
| Option | Default | Description |
|---|---|---|
-o, --output |
vault root | Output directory (vault-relative unless absolute) |
--interval |
15 | Frame extraction interval in seconds |
--max-frames |
None | Maximum number of frames to extract |
--frame-format |
jpg | Frame format: jpg or png |
--language |
en | Transcript language code |
--dedup-threshold |
0.85 | Similarity threshold for removing duplicate frames (0.0-1.0) |
--no-dedup |
- | Disable frame deduplication |
--prefer-manual |
- | Only use manual transcripts |
--keep-video |
- | Keep downloaded video file after frame extraction |
-y, --yes |
- | Skip confirmation prompt for large batches (>10 videos) |
-v, --verbose |
- | Verbose output |
| Option | Default | Description |
|---|---|---|
-o, --output |
vault root | Output directory (vault-relative unless absolute) |
--interval |
15 | Frame extraction interval in seconds |
--max-frames |
None | Maximum number of frames to extract |
--frame-format |
jpg | Frame format: jpg or png |
--dedup-threshold |
0.85 | Similarity threshold for removing duplicate frames (0.0-1.0) |
--no-dedup |
- | Disable frame deduplication |
--fast |
- | Fast extraction using keyframe seeking (recommended for long videos) |
--json |
- | Output JSON instead of console output (for scripting) |
-v, --verbose |
- | Verbose output |
Use a shorter interval with deduplication to catch slide transitions:
ytcapture URL --interval 5 --dedup-threshold 0.90Disable deduplication to keep all frames:
ytcapture URL --interval 10 --no-dedupLimit the number of frames to avoid huge output:
ytcapture URL --max-frames 50If you have mdformat installed, ytcapture will automatically format the output markdown:
pip install mdformat mdformat-gfm mdformat-frontmatterMIT