Skip to content

Cadence: immersive reading pipeline from EPUB to audiobook with synchronized reader playback.

Notifications You must be signed in to change notification settings

mateogon/Cadence

Repository files navigation

Cadence

Cadence

Cadence is an immersive reading pipeline: book file -> chapter text -> audiobook audio -> word-level synced reader data.

Overview

  • Imports EPUB/MOBI/AZW3 files (MOBI/AZW3 are converted to EPUB via Calibre) and extracts ordered chapter text.
  • Synthesizes chapter audio with Supertonic TTS.
  • Aligns audio to text with WhisperX for word timestamps.
  • Plays back in a synced reader/player UI.
  • Uses a streaming chapter pipeline so reading can start before full import finishes.

Pipeline

  1. Source normalization: MOBI/AZW3 -> EPUB (Calibre, when needed)
  2. EPUB extraction (Calibre) -> library/<book>/content/ch_XXX.txt
  3. Per chapter: TTS synthesis (Supertonic) -> library/<book>/audio/ch_XXX.wav
  4. Per chapter: Alignment (WhisperX) -> library/<book>/content/ch_XXX.json
  5. Player can read chapters as soon as each chapter has audio + alignment.

Import Behavior

  • Cadence now runs chapter-by-chapter interleaved processing:
    • If a chapter already has .wav, Cadence skips synthesis and aligns it.
    • If .wav is missing, Cadence synthesizes first, then aligns.
    • If .wav and .json both exist, Cadence skips that chapter.
  • This makes resume robust after interruptions and enables immediate reading while import is still running.
  • Library cards update live with ready counts (Audio x/y, Alignment x/y) during import.

Read While Importing

Cadence processes books one chapter at a time, not as one long batch.

  • As soon as a chapter finishes synthesis + alignment, it is immediately readable.
  • You can open the reader and start from available chapters while the rest of the book continues importing.
  • If import is interrupted, re-import resumes from existing chapter outputs instead of starting over.

UI Preview

Library While Importing RSVP View Context View

Requirements

  • Windows 10/11
  • Python 3.12
  • Calibre (ebook-convert.exe) installed at:
    • C:\Program Files\Calibre2\ebook-convert.exe
  • FFmpeg (ffmpeg.exe) available in PATH (required for Qt player speed control)
  • NVIDIA GPU recommended for faster TTS/ASR

Install

python -m venv venv
.\venv\Scripts\Activate.ps1
python -m pip install --upgrade pip

Install one runtime profile (fresh venv recommended):

GPU profile (default):

pip install -r requirements-gpu.txt

CPU profile:

pip install -r requirements-cpu.txt

Backward-compatible default (requirements.txt) points to GPU profile. Do not install both CPU and GPU ONNX Runtime packages in the same environment.

If you use a separate WhisperX venv, point Cadence to it:

$env:CADENCE_WHISPERX_PYTHON="C:\Users\mateo\Desktop\Cadence\venv_whisperx\Scripts\python.exe"

Run

.\venv\Scripts\Activate.ps1
python main.py

Configuration

Cadence uses cadence_settings.json (managed from the UI settings cog next to Import Book).

  • Settings are persisted automatically when you click Apply.
  • Settings are applied immediately to the current app process.
  • CADENCE_* environment variables are still usable for one-off CLI/script runs.

Useful keys:

  • CADENCE_EXTRACT_WORKERS
  • CADENCE_SYNTH_WORKERS
  • CADENCE_TTS_MAX_CHARS
  • CADENCE_FORCE_CPU
  • CADENCE_CUDA_ONLY
  • CADENCE_WHISPERX_MODEL
  • CADENCE_WHISPERX_BATCH_SIZE
  • CADENCE_WHISPERX_COMPUTE_TYPE
  • CADENCE_WHISPERX_DEVICE
  • CADENCE_WHISPERX_PYTHON
  • CADENCE_CALIBRE_PATH

Project Docs

About

Cadence: immersive reading pipeline from EPUB to audiobook with synchronized reader playback.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published