Scriptotic

Turn YouTube videos into text transcripts with Voxtral AI

Scriptotic is a transcription tool that downloads YouTube videos, extracts the audio, and creates accurate text transcripts using Voxtral Mini 3B, one of the most accurate speech-to-text models available. It offers both a web interface (Playwright-compatible) and command-line access.

Features

YouTube Video Download: Automatically downloads audio from YouTube videos
High-Quality Transcription: Uses Voxtral Mini 3B (14% better accuracy than Whisper large-v3)
Speaker Identification: Identifies different speakers and labels their dialogue
Web Interface: Modern web interface accessible via browser and Playwright automation
Command Line Support: CLI for automation and scripting
Multilingual Support: Automatic language detection and superior multilingual accuracy
Multiple Output Formats: Text, JSON, or SRT subtitle formats

What You Need

System Requirements

Windows 10/11 with WSL2 or Linux (64-bit)
WSL2 with Ubuntu (for Windows users)
NVIDIA GPU (RTX 20-series or newer, 12GB+ VRAM recommended)
16GB+ RAM (for optimal performance)
15GB+ free disk space (for Voxtral model and temporary files)
Stable internet connection (for first-time setup only)

Accounts You'll Need

HuggingFace Account: Free account at huggingface.co for Voxtral model access

Quick Start

1. Setup (First Time Only)

For Windows Users:

# Run in WSL2 Ubuntu terminal
bash scripts/setup_wsl2_voxtral.sh

For Linux Users:

# Run in your Linux terminal
bash scripts/setup_wsl2_voxtral.sh

2. Start the Web Interface

For Windows Users (Recommended):

# Double-click this file in Windows Explorer:
start_scriptotic.bat

# Or run in Command Prompt:
start_scriptotic.bat

For WSL/Linux Users:

# Run in terminal:
bash start_scriptotic.sh

3. Access the Interface

Open your browser and go to: http://localhost:5000

The web interface provides:

✅ Form-based transcription - Enter YouTube URL, speaker names, choose format
✅ Real-time progress tracking - Live updates as transcription proceeds
✅ Server status monitoring - Shows Voxtral server startup progress
✅ File downloads - Download completed transcripts
✅ Full Playwright compatibility - Perfect for browser automation

Playwright Automation Example

// Navigate to Scriptotic web interface
await page.goto('http://localhost:5000');

// Fill in the YouTube URL
await page.getByRole('textbox', { name: 'YouTube URL:' }).fill('https://www.youtube.com/watch?v=dQw4w9WgXcQ');

// Set speaker names
await page.getByRole('textbox', { name: 'Speaker Names:' }).fill('Rick, Audience');

// Choose JSON output format
await page.getByLabel('Output Format:').selectOption('JSON');

// Start transcription (when server is ready)
await page.getByRole('button', { name: 'Generate Transcript' }).click();

// Wait for completion and download result
await page.getByRole('button', { name: 'Download Transcript' }).click();

How to Use

Web Interface (Recommended)

Start the web server using start_scriptotic.bat (Windows) or start_scriptotic.sh (Linux)
Open http://localhost:5000 in your browser
First time: You'll be prompted to enter your HuggingFace token
Enter the YouTube video URL
Enter speaker names (comma-separated, e.g., "Alice, Bob, Charlie") - optional
Choose output format:
- text: Human-readable transcript with speaker labels
- json: Structured data with timestamps
- srt: Subtitle file format
Click "Generate Transcript"
Wait for processing (progress bar shows status)
Download the completed transcript

Command Line Usage

# Direct CLI usage (after starting the web server)
python src/core/scriptotic.py "https://www.youtube.com/watch?v=VIDEO_ID" --names "Alice,Bob,Charlie" --output "transcript.txt"

# With JSON output
python src/core/scriptotic.py "https://www.youtube.com/watch?v=VIDEO_ID" --format json --output "transcript.json"

HuggingFace Token Setup (First Run Only)

Go to huggingface.co and create a free account
Go to Settings → Access Tokens
Click "New token" and create a token with Read permissions
Accept the license agreement for the Voxtral model:
- mistralai/Voxtral-Mini-3B-2507
Save your token - you'll need it when you first run Scriptotic

Output Format

The transcript will include:

Video title and metadata
Model information (Voxtral Mini 3B)

Speaker-labeled dialogue:

[Alice] Hello everyone, welcome to today's discussion.
[Bob] Thanks for having me, Alice. I'm excited to talk about this topic.
[Alice] Let's start with the basics...

Technical Details

Architecture

WSL2-Based Design:

Web Server: Flask server running in WSL2
Backend: vLLM server with Voxtral Mini 3B model
Communication: HTTP API between web interface and transcription engine
Access: Browser-based interface accessible from Windows and WSL2

What's Happening Under the Hood

Web Interface: Modern HTML/JavaScript interface with real-time updates
Audio Download: yt-dlp downloads audio from YouTube in WSL2 environment
Server Management: Automatic Voxtral server startup and health monitoring
Transcription: Voxtral Mini 3B transcribes audio via vLLM HTTP API
Speaker Mapping: Optional speaker name assignment
Output: Generates formatted transcript in chosen format

Model Information

Engine: Voxtral Mini 3B (mistralai/Voxtral-Mini-3B-2507)
Accuracy: 5.1% WER (14% better than Whisper large-v3's 5.9% WER)
Languages: Superior multilingual support with automatic detection
Requirements: ~9.5GB VRAM, fits perfectly on RTX 4080 12GB

Privacy

All processing is local - no audio or transcripts are sent to external servers
Only model downloads require internet connection
Your HuggingFace token is stored locally in WSL2 environment
Audio files are automatically cleaned up after processing

Troubleshooting

Common Issues

"Web server won't start"

Make sure WSL2 is installed and Ubuntu distribution is available
Run the setup script: bash scripts/setup_wsl2_voxtral.sh
Check that all dependencies are installed in the virtual environment

"Can't access http://localhost:5000"

Ensure the web server is running (check terminal output)
Try http://127.0.0.1:5000 instead
Check Windows firewall settings for port 5000

"Server failed to start"

Check that your NVIDIA driver is version 555 or newer
Ensure you have enough disk space (~15GB free)
Check the server logs: cat wsl2/voxtral.log

"HuggingFace token invalid"

Make sure you accepted the license agreement for the Voxtral model
Generate a new token with Read permissions
The token should start with hf_

Very slow transcription

This is normal - Voxtral prioritizes accuracy over speed
A 1-hour video typically takes 10-15 minutes to process
The accuracy improvement (14% better than Whisper) is worth the wait

WSL2 using too much memory (Windows)

WSL2 may keep 8-12GB allocated even after servers stop
Solution: Run cleanup_memory.bat to free WSL2 memory
Or: Close the server terminal window (automatically runs cleanup)
This frees the vmmemWSL process you see in Task Manager

Getting Help

If you encounter issues:

Check the web interface status indicator
Look at server logs: cat wsl2/voxtral.log
Verify WSL2 is working: run wsl -d Ubuntu ls in Command Prompt
Make sure all setup steps were completed
Try with a shorter video first

Files Overview

Main Launchers (Choose one):

start_scriptotic.bat - Windows launcher with visible server console
start_scriptotic.sh - Linux/WSL launcher with visible server console

Utility Scripts:

cleanup_memory.bat - Windows utility to free WSL2 memory (use if vmmemWSL is using too much RAM)

Core Components:

src/core/web_server.py - Flask web server
src/core/templates/index.html - Web interface
src/core/scriptotic.py - Core transcription logic and CLI
src/core/voxtral_engine.py - Voxtral transcription engine

Setup:

scripts/setup_wsl2_voxtral.sh - One-time environment setup

Version Information

Current Version: 2.0.0
Status: Production ready - Web interface with Playwright compatibility
Architecture: Flask web server + WSL2 vLLM backend

License

This project is open source. See individual model licenses for AI models used.

Contributing

Found a bug or want to improve Scriptotic? Please open an issue or submit a pull request!

Made with ❤️ by brinedew

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
config		config
docs		docs
scripts		scripts
sentinel		sentinel
sprints/Archived		sprints/Archived
src/core		src/core
tests		tests
voxtral-env._trash		voxtral-env._trash
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CLAUDE.md.backup		CLAUDE.md.backup
README.md		README.md
cleanup_memory.bat		cleanup_memory.bat
config.yml		config.yml
force_kill_vllm.ps1		force_kill_vllm.ps1
sentinel_tunnel_config.yml		sentinel_tunnel_config.yml
smart_shutdown_monitor.ps1		smart_shutdown_monitor.ps1
start_scriptotic.bat		start_scriptotic.bat
start_scriptotic.sh		start_scriptotic.sh
start_scriptotic_web.bat		start_scriptotic_web.bat
start_scriptotic_web.ps1		start_scriptotic_web.ps1
stop_scriptotic_web.ps1		stop_scriptotic_web.ps1
tunnel_config.yml		tunnel_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scriptotic

Features

What You Need

System Requirements

Accounts You'll Need

Quick Start

1. Setup (First Time Only)

2. Start the Web Interface

3. Access the Interface

Playwright Automation Example

How to Use

Web Interface (Recommended)

Command Line Usage

HuggingFace Token Setup (First Run Only)

Output Format

Technical Details

Architecture

What's Happening Under the Hood

Model Information

Privacy

Troubleshooting

Common Issues

Getting Help

Files Overview

Version Information

License

Contributing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Brinedew/Scriptotic

Folders and files

Latest commit

History

Repository files navigation

Scriptotic

Features

What You Need

System Requirements

Accounts You'll Need

Quick Start

1. Setup (First Time Only)

2. Start the Web Interface

3. Access the Interface

Playwright Automation Example

How to Use

Web Interface (Recommended)

Command Line Usage

HuggingFace Token Setup (First Run Only)

Output Format

Technical Details

Architecture

What's Happening Under the Hood

Model Information

Privacy

Troubleshooting

Common Issues

Getting Help

Files Overview

Version Information

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages