PDF Chat

Overview

PDF Chat is an advanced tool designed to interactively query and analyze the contents of PDF documents using the power of OpenAI's models. It processes PDFs, extracts and indexes their text, and provides an intuitive interface for users to ask questions about the document contents. This application is especially useful for research, legal documents, academic papers, and any other scenario where in-depth document analysis is required.

Key Features

Text Extraction and Chunking: Efficiently extracts text from PDFs and splits them into manageable chunks for processing.
Similarity Search: Utilizes FAISS (Facebook AI Similarity Search) for quick and efficient similarity search on document chunks.
AI-Powered Querying: Integrates with OpenAI's GPT-3.5-Turbo, GPT-4o, GPT-4-Turbo models to provide accurate and context-aware answers to user queries based on the document content.
User-Friendly Interface: Leverages Gradio to offer a clean and easy-to-use web interface for interacting with the system.
Settings Management: Supports loading and saving configuration settings, including the OpenAI API key and model preferences.
History and Visualization: Maintains a history of questions and answers, and generates word clouds from the question history to visualize frequently asked topics.

Installation

Clone the Repository:

git clone https://github.com/ethan-haas/PDFChat.git
cd PDFChat

Install Dependencies:
```
pip install -r requirements.txt
```

Usage

Run the Application:
```
python PDFChat.py
```
Open the Web Interface:
- The web interface will automatically open in your default browser. If not, open http://127.0.0.1:7860.
Upload PDF Files:
- Use the "Home" tab to upload PDF files.
- Once uploaded, you can ask questions about their contents.
Query PDF Files:
- Enter your question in the input box and click "Ask".
- The answer will be displayed below.
Manage Settings:
- Use the "Settings" tab to change the model, chunk size, prompts, and API key.
- Click "Save Settings" to apply changes.
View and Export Query History:
- Use the "Question History" tab to view past queries and their answers.
- Export history to a CSV file or clear the history as needed.
Generate Word Cloud:
- Use the "Visualization" tab to generate a word cloud of the query history.

File Structure

PDFSearcher.py: Main script to run the application.
settings.json: Configuration file for settings.
qa_history.db: SQLite database for storing query history.
requirements.txt: List of required Python packages.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
LICENSE		LICENSE
PDFChat.ipynb		PDFChat.ipynb
PDFChat.py		PDFChat.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Chat

Overview

Key Features

Installation

Usage

File Structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ethan-haas/PDFChat

Folders and files

Latest commit

History

Repository files navigation

PDF Chat

Overview

Key Features

Installation

Usage

File Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages