YATSEE -- Yet Another Tool for Speech Extraction & Enrichment
YATSEE is a local-first, end-to-end data pipeline designed to systematically refine raw meeting audio into a clean, searchable, and auditable intelligence layer. It automates the tedious work of downloading, transcribing, and normalizing unstructured conversations.
This is a local-first, privacy-respecting toolkit for anyone who wants to turn public noise into actionable intelligence.
Public records are often public in name only. Civic business is frequently buried in four-hour livestreams and jargon-filled transcripts that are technically accessible but functionally opaque. The barrier to entry for an interested citizen is hours of time and complex jargon.
YATSEE solves that by using a carefully tuned local LLM to transform that wall of text into a high-signal summaryβextracting the specific votes, contracts, and policy debates that matter. It's a tool for creating the clarity and accountability that modern civic discourse requires, with or without the government's help.
A modular pipeline for extracting, enriching, and summarizing civic meeting audio data.
Scripts are modular, with clear input/output expectations.
- Script:
yatsee_download_audio.sh - Input: YouTube URL (bestaudio)
- Output:
.mp4or.webmtodownloads/ - Tool:
yt-dlp- Purpose: Archive livestream audio for local processing
- Script:
yatsee_format_audio.sh - Input:
.mp4or.webmfromdownloads/ - Output:
.wavor.flactoaudio/ - Tool:
ffmpeg - Format Settings:
- WAV:
-ar 16000 -ac 1 -sample_fmt s16 -c:a pcm_s16le - FLAC
-ar 16000 -ac 1 -sample_fmt s16 -c:a flac
- WAV:
- Script:
yatsee_transcribe_audio.py - Input:
.flacfromaudio/ - Output:
.vtttotranscripts_<model>/ - Tool:
whisperorfaster-whisper - Notes:
- Supports
faster-whisperif installed - Accepts model selection:
small,medium,large, etc. - Outputs to
transcripts_<model>/by default
- Supports
- Script:
yatsee_slice_vtt.py - Input:
.vttfromtranscripts_<model>/ - Output:
.txt(timestamp-free) to same folder
- Script:
yatsee_slice_vtt.py - Input:
.vttfromtranscripts_<model>/ - Output:
.jsonlto same folder- Purpose: JSONL segments (sliced transcript for embeddings/search).
- Script:
yatsee_polish_transcript.py - Input:
.txtfromtranscripts_<model>/ - Output:
.punct.txttonormalized/ - Tool:
deep multilingual punctuation model- Purpose: Deep learning punctuation.
- Script:
yatsee_normalize_structure.py - Input:
.punct.txtfromnormalized/ - Output:
.txttonormalized/ - Tool:
spaCy- Purpose: Segment text into readable sentences and normalize punctuation/spacing.
- Script:
yatsee_summarize_transcripts.py - Input:
.outor.txtfromnormalized/ - Output:
.mdor.yamltosummary/ - Tool:
ollama - Notes:
- Supports short and long-form summaries
- Optional YAML output (e.g., vote logs, action items, discussion summaries)
All scripts are modular and can be run independently or as part of an automated workflow.
- Goal: Turn the generated summaries and raw transcripts into a searchable civic intelligence database.
- Vector Search (Semantic): Use ChromaDB with the
nomic-embed-textmodel to allow for fuzzy, concept-based queries (e.g., "Find discussions about road repairs"). - Graph Search (Relational): Extract structured data (Votes, Contracts, Appointments) into a knowledge graph to trace connections between people and money.
- UI: A simple web interface built with Streamlit to provide an overview of the city's operations.
transcripts_pipeline/
β
βββ downloads/ β raw input (audio/video)
βββ audio/ β post-conversion (.wav/.flac) files
βββ transcripts_<model>/ β VTTs + initial flat .txt files
β βββ meeting.vtt
β βββ meeting.txt β basic timestamp removal only
β
βββ normalized/ β cleaned + structured output
β βββ meeting.txt β just structure normalization (spaCy)
β
βββ summary/ β generated meeting summaries (.yaml/.md) files
β βββ summary.md
This pipeline was developed and tested on the following setup:
- CPU: Intel Core i7-10750H (6 cores / 12 threads, up to 5.0 GHz)
- RAM: 32 GB DDR4
- GPU: NVIDIA GeForce RTX 2060 (6 GB VRAM, CUDA 12.8)
- Storage: NVMe SSD
- OS: Fedora Linux
- Shell: Bash
- Python: 3.8 or newer
Additional testing was performed on Apple Silicon (macOS):
- Model: Mac Mini (M4 Base)
- CPU: Apple M4 (10 cores / 4 performance cores, up to 120GB/s memory bandwidth)
- RAM: 16 GB
- Storage: NVMe SSD
- OS: macOS Sonoma / Sequoia
- Shell: ZSH
- Python: 3.9 or newer
GPU acceleration was enabled for Whisper / faster-whisper using CUDA 12.8 and NVIDIA driver 570.144 on Linux. However, faster whisper has limited/no support for mps.
Note: Audio transcription was much slower on the MAC than on Linux. it's doable but it's much slower.
Note: The pipeline works on CPU-only systems without a GPU. However, transcription (especially with Whisper or faster-whisper) will be much slower compared to systems with CUDA-enabled GPU acceleration or MPS.
β οΈ Not test on Windows. Use at your own risk onWindowsplatforms.
yt-dlpβ Download livestream audio from YouTubeffmpegβ Convert audio to.flacor.wavformat
tomlNeeded for reading the toml configrequestsNeeded for interacting with ollama API if installedtorchRequired for Whisper and model inference (with or without CUDA)pyyamlYAML output support (for summaries)whisperAudio transcription (standard)spacySentence segmentation + text cleanup- Model: en_core_web_sm (or larger)
faster-whisperAudio transcription (optional)
- ollama Run local LLMs for summarization
macOS (Homebrew):
brew install yt-dlp ffmpegFedora:
sudo dnf install yt-dlp ffmpegDebian/Ubuntu:
sudo apt-get update
sudo apt-get install ffmpeg
sudo curl -L https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlpYou can use pip to install the core requirements:
Install:
pip install torch torchaudio tqdm
pip install --upgrade git+https://github.com/openai/whisper.git
pip install toml pyyaml spacy
python -m spacy download en_core_web_smInstall:
pip install faster-whisper # optional, for better performanceOn first run, it will download a model (e.g., base, medium). Ensure you have enough RAM.
Used for generating markdown or YAML summaries from transcripts.
install:
pip install requests
curl -fsSL https://ollama.com/install.sh | shSee https://ollama.com for supported models and system requirements.
Run each script in sequence or independently as needed:
./yatsee_download_audio.sh https://youtube.com/some-url./yatsee_format_audio.shpython yatsee_transcribe_audio.py --audio_input ./audio --model medium --fasterSlice and segment vtt files for embeddings/search or Strip timestamps for txt
python yatsee_slice_vtt.py --vtt-input transcripts_<model> --output-dir transcripts_<model> --window 30Normalize structure
# Install
python -m spacy download en_core_web_sm
python yatsee_normalize_structure.py -i transcripts_medium/YATSEE includes an optional script for generating summaries using a local LLM via the ollama tool. This script demonstrates one possible downstream use of the normalized transcripts.
# Requires ollama and a pulled model
python3 yatsee_summarize_transcripts.py -i normalized/ -m your_pulled_model| Script | Purpose |
|---|---|
yatsee_download_audio.sh |
Download audio from YouTube URLs |
yatsee_format_audio.sh |
Convert downloaded files to .flac or .wav |
yatsee_transcribe_audio.py |
Transcribe audio files to .vtt |
yatsee_slice_vtt.py |
Slice and segment .vtt files |
yatsee_normalize_structure.py |
Clean and normalize text structure |
yatsee_summarize_transcripts.py |
Generate summaries from cleaned transcripts |