ClipFeed

Self-hosted short-form video platform with a transparent, user-controllable algorithm.

Architecture

nginx :80
  ├── React PWA  (web/)
  └── Go API :8080  (api/)
        ├── SQLite WAL  (single connection, schema embedded)
        ├── MinIO :9000  (S3-compatible object storage)
        └── jobs table → Python Worker  (ingestion/)
                           └── LLM :11434 ← Scout *  (scout/)
                                              * ai profile only

Stack

Component	Tech	Purpose
API	Go + Chi	REST API, auth, feed algorithm, search
Frontend	React + Vite PWA	Mobile-first swipe feed
Worker	Python	Video download, scene-split, transcode, transcribe
Score Updater	Python	Periodic content score recalculation
Scout	Python (ai profile)	LLM-backed content discovery and scoring
LLM	Ollama or hosted API (ai profile)	Local or hosted inference (summaries, scoring, scouting)
Database	SQLite (WAL)	Single-connection WAL: users, clips, interactions, jobs, topic graph, embeddings
Storage	MinIO	S3-compatible object storage for video and thumbnail files
Search	SQLite FTS5	Full-text search across clip titles, transcripts, channels
Proxy	nginx	Reverse proxy, SPA routing, streaming optimization

Quick Start

# Clone the repo, then run the interactive setup script
./setup.sh

setup.sh generates all cryptographic secrets, prompts for key settings (profiles, processing mode, LLM provider, storage limits), and writes a ready-to-use .env. Requires openssl on your PATH. If a .env already exists, the script offers to back it up before overwriting.

# Launch
make up

# Watch logs
make logs-worker

Choosing what to start -- set COMPOSE_PROFILES in .env (prompted during setup):

`.env` setting	Services started
`COMPOSE_PROFILES=`	Base stack (no AI)
`COMPOSE_PROFILES=ai`	+ Scout (cloud LLM)
`COMPOSE_PROFILES=ai,ollama`	+ Scout + local Ollama

GPU acceleration (requires NVIDIA Container Toolkit):

COMPOSE_FILE=docker-compose.yml:docker-compose.gpu.yml

HTTPS (Automatic Let's Encrypt via Caddy):

SERVER_NAME=clipfeed.yourdomain.com docker compose -f docker-compose.yml -f docker-compose.caddy.yml up -d

The app will be available at http://localhost.

Content Pipeline

Scouting (ai profile): Scout worker discovers and evaluates candidate videos via LLM scoring.
Ingestion: User submits a URL (or Scout auto-approves above threshold). Supports YouTube, Vimeo, TikTok, Instagram, etc.
Download: Worker fetches via yt-dlp (with optional platform cookies for authenticated access).
Segmentation: ffmpeg detects scene changes and splits into 15–90s clips.
Transcoding: Each clip is transcoded to mobile-optimized mp4.
Transcription: faster-whisper transcribes audio for search and topic extraction.
Embeddings & Topics: sentence-transformers generates embeddings; KeyBERT extracts topics and maps them into the topic graph.
Storage: Clips and thumbnails upload to MinIO; metadata writes to SQLite.
Scoring: Score Updater periodically recalculates content_score from aggregate interactions.

Algorithm

The feed algorithm is fully transparent and user-controllable:

Exploration Rate (0–100%): Balance between engagement-optimized and random discovery.
Clip Duration Bounds: Minimum and maximum clip lengths.
Topic Weights: Per-topic interest sliders to boost or suppress topics.
Saved Filters: Reusable named filter presets.

The ranking pipeline:

SQLite initial sort: content_score * (1 - exploration_rate) + random * exploration_rate
Topic weight multipliers from user preferences
Embedding-based L2R rescoring (cosine similarity of float32 blobs)
24-hour deduplication of recently seen clips

Ingestion Limits vs User Preferences

Env vars control the worker pipeline (clip duration, download limits, processing mode) -- these bound what enters the global pool.
Settings page lets each user filter the global pool to match their current preferences.

Key env vars (in .env):

Variable	Default	Description
`PROCESSING_MODE`	`transcode`	`transcode` (scale to 720p vertical) or `copy` (fast, keeps original)
`MIN_CLIP_SECONDS`	`15`	Minimum clip duration after scene-split
`MAX_CLIP_SECONDS`	`90`	Maximum clip duration after scene-split
`TARGET_CLIP_SECONDS`	`45`	Target clip length
`MAX_VIDEO_DURATION`	`3600`	Maximum source video length in seconds
`MAX_DOWNLOAD_SIZE_MB`	`2048`	Maximum download size
`MAX_WORKERS`	`4`	Max concurrent ingestion jobs
`WHISPER_MODEL`	`medium`	faster-whisper model size
`CLIP_TTL_DAYS`	`30`	Days before unprotected clips expire
`SCORE_UPDATE_INTERVAL`	`900`	Seconds between score recalculation passes

Backup & Restore

make backup
# Creates backups/YYYYMMDD_HHMMSS/ containing clipfeed.db and minio.tar.gz

make restore BACKUP_DIR=backups/YYYYMMDD_HHMMSS
# Stops services, overwrites DB/storage from backup, restarts services

Storage Lifecycle

Clips auto-expire after CLIP_TTL_DAYS (default 30 days). Saving or favoriting a clip sets is_protected = 1 (via trigger), exempting it from eviction.

make lifecycle          # run manually

Or add to crontab:

0 3 * * * cd /path/to/clipfeed && make lifecycle

Alternate Database (Postgres)

ClipFeed defaults to SQLite (WAL mode), which comfortably handles ~30–50 concurrent active users.

For heavier load (100+ concurrent users) or multi-tenant deployments, switch to PostgreSQL:

Set DB_DRIVER=postgres and DB_URL=postgres://user:pass@host:5432/clipfeed in .env
Restart the API -- the backend initializes schema on boot automatically.

Frontend Configuration

The React frontend reads window.__CONFIG__ at runtime, so the same build can be pointed at any backend. To deploy the UI on Vercel/Netlify/Pages, edit web/index.html:

<script>
  window.__CONFIG__ = {
    API_BASE: 'https://api.yourdomain.com/api',
    STORAGE_BASE: 'https://api.yourdomain.com'
  };
</script>

When run behind nginx (default), backend routing is controlled via environment variables: API_UPSTREAM, WEB_UPSTREAM, and MINIO_UPSTREAM.

PWA Installation

The frontend is a Progressive Web App. On mobile:

Android: Chrome menu → "Add to Home Screen"
iOS: Safari share → "Add to Home Screen"

No app store needed.

Admin Dashboard

ClipFeed includes a hidden system status dashboard for monitoring and maintenance.

Access: Navigate to /admin in your browser.
Authentication: Uses separate credentials defined in your .env file (ADMIN_USERNAME and ADMIN_PASSWORD).
Features:
- System Health: Real-time monitoring of memory usage, goroutines, and Go runtime stats.
- Queue Management: Monitor the ingestion pipeline (running, queued, failed, and rejected jobs).
- Content Stats: Track the total number of ready clips, processing backlog, and storage usage (GB).
- Database Stats: View total users, aggregate interactions, and database file size.
- AI/LLM Monitoring: Detailed stats on Scout evaluations, approval rates, and clip summaries.
- LLM Logs: View raw prompts and responses for every LLM interaction to debug discovery and summarization.
- Failure Analysis: Review error messages for failed ingestion jobs and clear the queue.
- Activity Charts: Visual 7-day history of clip ingestion and user engagement.

API Endpoints

Public

GET /health - Health check
GET /api/config - Client configuration flags

Auth

POST /api/auth/register - Create account
POST /api/auth/login - Sign in

Feed & Discovery

GET /api/feed - Personalized feed (supports anonymous access)
GET /api/clips/:id - Clip details
GET /api/clips/:id/stream - Presigned streaming URL
GET /api/clips/:id/similar - Similar clips (embedding-based)
GET /api/clips/:id/summary - LLM-generated clip summary
GET /api/search - Full-text search (FTS5)
GET /api/topics - Top topics
GET /api/topics/tree - Hierarchical topic graph

Interactions (auth required)

POST /api/clips/:id/interact - Record interaction (view, like, skip, etc.)
POST /api/clips/:id/save - Save/favorite clip
DELETE /api/clips/:id/save - Unsave clip

Ingestion (auth required)

POST /api/ingest - Submit URL for processing
GET /api/jobs - List processing jobs
GET /api/jobs/:id - Job details

User Profile (auth required)

GET /api/me - Profile with preferences and topic weights
PUT /api/me/preferences - Update algorithm preferences
GET /api/me/saved - Saved clips
GET /api/me/history - Watch history

Cookies (auth required)

GET /api/me/cookies - List cookie status per platform
PUT /api/me/cookies/:platform - Set platform cookie (for yt-dlp auth)
DELETE /api/me/cookies/:platform - Remove platform cookie

Collections (auth required)

POST /api/collections - Create collection
GET /api/collections - List collections
GET /api/collections/:id/clips - List clips in collection
POST /api/collections/:id/clips - Add clip to collection
DELETE /api/collections/:id/clips/:clipId - Remove clip from collection
DELETE /api/collections/:id - Delete collection

Filters (auth required)

POST /api/filters - Create saved filter
GET /api/filters - List saved filters
PUT /api/filters/:id - Update filter
DELETE /api/filters/:id - Delete filter

Scout (auth required)

POST /api/scout/sources - Add scout source (channel/playlist)
GET /api/scout/sources - List scout sources
PATCH /api/scout/sources/:id - Update scout source
DELETE /api/scout/sources/:id - Delete scout source
POST /api/scout/sources/:id/trigger - Force immediate check
GET /api/scout/candidates - List discovered candidates
POST /api/scout/candidates/:id/approve - Approve candidate for ingestion
GET /api/scout/profile - User's interest profile (what Scout optimizes for)

Admin (admin auth required)

POST /api/admin/login - Admin login (returns distinct admin JWT)
GET /api/admin/status - System status, database, and queue metrics
GET /api/admin/llm_logs - Recent LLM prompts and responses
POST /api/admin/clear-failed - Purge failed/rejected jobs from the queue

Development

All builds and tests run inside Docker -- do not run Go, npm, or Python on the host.

# Rebuild a single service after changes
docker compose up -d --build api        # Go API
docker compose up -d --build web        # frontend
docker compose up -d --build worker     # ingestion worker
docker compose up -d --build scout      # scout worker

# Run API tests
make test-api-docker

# Useful make targets
make up                   # start all services (respects .env profiles)
make down                 # stop all services
make logs-api             # tail API logs
make logs-worker          # tail worker logs
make shell-db             # sqlite3 shell into the database
make lifecycle            # expire old clips
make score                # trigger score update
make clean                # stop + remove volumes

LLM Provider Configuration

Scout, clip summaries, and AI-assisted features require an LLM. Two modes:

Setting	Local (Ollama)	Hosted API
`LLM_PROVIDER`	`ollama`	`openai` or `anthropic`
`LLM_BASE_URL`	(auto: internal `llm` service)	API endpoint URL
`LLM_API_KEY`	(not needed)	Your API key
`LLM_MODEL`	(uses `OLLAMA_MODEL`)	Model name

Set COMPOSE_PROFILES=ai (add ollama for local inference).
Python workers route calls through LiteLLM; any OpenAI-compatible endpoint works.

Using Claude (Anthropic) as the hosted LLM:

Option A -- native Anthropic provider:

LLM_PROVIDER=anthropic
LLM_BASE_URL=https://api.anthropic.com/v1
LLM_API_KEY=<your Anthropic API key>
LLM_MODEL=claude-sonnet-4-6
ANTHROPIC_VERSION=2023-06-01

Option B -- via Anthropic's OpenAI-compatible endpoint:

LLM_PROVIDER=openai
LLM_BASE_URL=https://api.anthropic.com/v1/
LLM_API_KEY=<your Anthropic API key>
LLM_MODEL=claude-sonnet-4-6

Both options work. Option B uses Anthropic's OpenAI SDK compatibility layer, which accepts standard OpenAI-format requests and translates them to the Claude API. Note that some advanced Claude features (prompt caching, extended thinking output, citations) are only available through the native Anthropic API (Option A).

Roadmap

Phase 1: Core pipeline -- ingest, split, serve, basic feed
Phase 2: Algorithm engine -- topic graph, L2R embeddings, preference UI, collections
Phase 3: Search (FTS5), saved filters, platform cookies, clip summaries, Scout
Phase 4: Multi-user (auth, per-user preferences/embeddings/collections)

Possible future directions: sharing, federation, public collections. TikTokApi (Playwright-based) / instagrapi for feed?

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
.vscode		.vscode
api		api
ingestion		ingestion
media/img		media/img
nginx/templates		nginx/templates
scout		scout
scripts		scripts
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Caddyfile		Caddyfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.caddy.yml		docker-compose.caddy.yml
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
setup.sh		setup.sh
update_schema.sql		update_schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClipFeed

Table of Contents

Architecture

Stack

Quick Start

Content Pipeline

Algorithm

Ingestion Limits vs User Preferences

Backup & Restore

Storage Lifecycle

Alternate Database (Postgres)

Frontend Configuration

PWA Installation

Admin Dashboard

API Endpoints

Public

Auth

Feed & Discovery

Interactions (auth required)

Ingestion (auth required)

User Profile (auth required)

Cookies (auth required)

Collections (auth required)

Filters (auth required)

Scout (auth required)

Admin (admin auth required)

Development

LLM Provider Configuration

Roadmap

About

Uh oh!

Releases

Packages

Languages

License

modestimpala/ClipFeed

Folders and files

Latest commit

History

Repository files navigation

ClipFeed

Table of Contents

Architecture

Stack

Quick Start

Content Pipeline

Algorithm

Ingestion Limits vs User Preferences

Backup & Restore

Storage Lifecycle

Alternate Database (Postgres)

Frontend Configuration

PWA Installation

Admin Dashboard

API Endpoints

Public

Auth

Feed & Discovery

Interactions (auth required)

Ingestion (auth required)

User Profile (auth required)

Cookies (auth required)

Collections (auth required)

Filters (auth required)

Scout (auth required)

Admin (admin auth required)

Development

LLM Provider Configuration

Roadmap

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages