Skip to content

A powerful, GUI automated video generation platform designed to create educational tech tutorials from PDF slides.

License

Notifications You must be signed in to change notification settings

techcow2/pdf2tutorial

Repository files navigation

PDF2Tutorial

A powerful, automated video generation platform designed to create educational tech tutorials from PDF slides. This project leverages AI for script refinement, high-quality Text-to-Speech (TTS), and programmatic video rendering.

Features

  • PDF to Presentation: Upload PDF slides and automatically extract them into a sequence of video scenes.
  • AI-Powered Scripting: Integrated with Google Gemini AI to transform fragmented slide notes into coherent, professional scripts.
  • High-Quality TTS: Supports local and cloud-based Text-to-Speech using Kokoro-js with customizable voices and quantization.
  • Rich Media Support: Insert MP4 videos and GIFs seamlessly between slides.
  • Programmatic Video Rendering: Built on Remotion, allowing for precise, frame-perfect video assembly and export.
  • Interactive Slide Editor: Drag-and-drop slide reordering, real-time audio generation, and script editing.
  • Background Music: Add and mix background tracks with custom volume controls.

Roadmap & TODO

  • YouTube Metadata Generator: Automatically generate optimized titles and descriptions using Gemini.
  • Thumbnail Generator: Create custom YouTube thumbnails based on slide content.
  • Voiceover Recording: Support for recording custom voiceovers directly within the app using a microphone.
  • Header Layout Optimization: Refactor and organize the application header for better aesthetics and usability.

Tech Stack

  • Frontend: React 19, Vite, Tailwind CSS (v4)
  • Video Engine: Remotion (v4)
  • AI: Google Gemini API (gemini-2.0-flash-lite)
  • TTS: Kokoro (FastAPI / Web Worker)
  • Backend: Express.js (serving as a rendering orchestration layer)
  • Utilities: Lucide React (icons), dnd-kit (drag & drop), pdfjs-dist (PDF processing)

Getting Started

Prerequisites

Installation

  1. Clone the repository:

    git clone https://github.com/techcow2/pdf2tutorial.git
    cd pdf2tutorial
  2. Install dependencies:

    npm install
  3. Start the development server (runs both Vite and the rendering server):

    npm run dev

The application will be available at http://localhost:5173.

Configuration

Open the Settings Modal (Gear Icon) in the application to configure:

  • API Keys: Add your Google AI Studio API Key for script refinement.
  • TTS Settings: Choose between internal Web Worker TTS or a local Dockerized Kokoro FastAPI instance.
  • Audio Defaults: Set default voice models and quantization levels (q4/q8).

Project Structure

  • src/video/: Remotion compositions and video components.
  • src/components/: React UI components (Slide Editor, Modals, Uploaders).
  • src/services/: Core logic for AI, TTS, PDF processing, and local storage.
  • server.ts: Express server handling the @remotion/renderer logic.

Rendering

Videos are rendered server-side using Remotion. When you click "Download Video", the sequence is bundled and rendered via an Express endpoint, then served back as an MP4 file.

License

MIT

About

A powerful, GUI automated video generation platform designed to create educational tech tutorials from PDF slides.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published