Lecture Notes Generator

This project creates comprehensive lecture notes from various source materials using RAG (Retrieval-Augmented Generation) with Ollama.

Overview

The pipeline consists of five main stages:

Converting PPTX to PDF: Converts PowerPoint presentations to PDF format.
Audio Extraction: Extracts audio from video lectures.
Audio Transcription: Transcribes audio to text using OpenAI's Whisper model.
PDF Processing: Extracts text and images from PDFs with automatic caption generation.
RAG Notes Generation: Creates comprehensive, well-structured lecture notes using RAG with Ollama.

Requirements

Python 3.8+ with required packages (listed in requirements.txt)
LibreOffice (for PPTX to PDF conversion)
FFmpeg (for audio extraction)
Ollama with the mistral:latest model

NOTE: Works best on a pure Linux system.

Installation

Install the required Python packages:

pip install -r requirements.txt

Install external dependencies:

# On Debian/Ubuntu
sudo apt-get update
sudo apt-get install ffmpeg libreoffice

# On macOS
brew install ffmpeg libreoffice

Install Ollama from https://ollama.ai/ and pull the Mistral model:

ollama pull mistral:latest

Usage

End-to-End Pipeline

Run the entire pipeline with:

python scripts/generate_all_notes.py

This will:

Convert all PPTX files to PDF
Extract audio from all video files
Transcribe all audio files
Process all PDF files
Generate lecture notes for all topics

Process Specific Topics

To process only specific topics:

python scripts/generate_all_notes.py --topics BST binary_tree_traversal

Skip Specific Steps

You can skip specific pipeline steps:

# Skip PPTX conversion
python scripts/generate_all_notes.py --skip-pptx

# Skip audio extraction and transcription
python scripts/generate_all_notes.py --skip-audio

# Skip PDF processing
python scripts/generate_all_notes.py --skip-pdf

# Only run the notes generator
python scripts/generate_all_notes.py --only-notes

Pipeline Components

The project consists of the following main components:

pptx_to_pdf.py: Converts PPTX files to PDF
audio.py: Extracts audio from video files
transcribe.py: Transcribes audio files to text
extraction.py: Extracts content from PDF files
simple_rag/: Contains RAG-related components
- vector_store.py: TF-IDF based vector store
- rag_application.py: RAG query system using Ollama
- lecture_notes_generator.py: Generates lecture notes

Output

The generated notes will be saved in:

Markdown: simple_rag/lecture_notes/markdown/
PDF: simple_rag/lecture_notes/

Data Structure

Place your lecture materials in LLM_DATASET/ with one folder per topic:

LLM_DATASET/
└── topic_name/
    ├── presentation.pptx
    ├── presentation.pdf
    ├── lecture.mp4
    ├── mp3/
    │   └── lecture.mp3
    ├── lecture.mp3.txt
    ├── presentation_gen.txt
    ├── presentation_captions.csv
    └── images/
        └── img_1_1.png

Troubleshooting

Ollama Connection: Ensure the Ollama service is running on http://localhost:11434
PDF Extraction: If PDF extraction fails, check that the PDF is not password-protected
LibreOffice: If PPTX conversion fails, ensure LibreOffice is correctly installed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lecture Notes Generator

Overview

Requirements

Installation

Usage

End-to-End Pipeline

Process Specific Topics

Skip Specific Steps

Pipeline Components

Output

Data Structure

Troubleshooting

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
scripts		scripts
simple_rag		simple_rag
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
audio.py		audio.py
extraction.py		extraction.py
pptx_to_pdf.py		pptx_to_pdf.py
requirements.txt		requirements.txt
transcribe.py		transcribe.py

nh2seven/Transcript.io

Folders and files

Latest commit

History

Repository files navigation

Lecture Notes Generator

Overview

Requirements

Installation

Usage

End-to-End Pipeline

Process Specific Topics

Skip Specific Steps

Pipeline Components

Output

Data Structure

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages