AskTheManual – A Multimodal RAG-PoC

AskTheManual is a Multimodal Retrieval-Augmented Generation (RAG) Proof of Concept designed to read, see, and explain manuals to your customers. It transforms static PDF manuals into an interactive chatbot that understands both text and images.

New: Now features a Guided User Interface (GUI) to make the data ingestion process simple and intuitive!

What it is

This project follows a multi-stage pipeline to create a searchable knowledge base:

Extraction: Converts detailed PDF manuals into Markdown.
Human-in-the-Loop Review: A GUI allows you to filter out "junk" images (icons, decorative elements) and keep only relevant diagrams.
Vision Enrichment: Uses AI (or human input) to describe screenshots, turning visual information into searchable text.
Vector Indexing: Stores the enriched content in a local vector database.
Local Chat: A Streamlit dashboard to query your manual.

Advantages

Local Control & Privacy: Uses local Ollama and FAISS for the core "brain". Your proprietary manual text stays local.
No "Black Box": You control the extraction. You decide which images strictly belong in the knowledge base.
Multimodal: The bot understands what's inside screenshots (e.g., "The default IP is 127.0.0.1") because the ingestion pipeline explicitly captures it.

🛠️ Installation

1. Requirements

Ensure you have Python 3.10+ installed.

pip install streamlit docling langchain-huggingface langchain-community faiss-cpu sentence-transformers requests ttkbootstrap openai

(Note: ttkbootstrap is required for the new GUI.)

docling-tools models download

2. Setup Ollama (Local LLM)

Install Ollama from ollama.com.
Pull a model (e.g., Qwen 2.5):
```
ollama pull qwen2.5:7b
```
Make sure the Ollama server is running.

3. Setup Embedding Model (Prevents GUI Freeze)

The project uses sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. To avoid the GUI freezing while downloading this model on the first run, execute this one-liner:

python -c "from langchain_huggingface import HuggingFaceEmbeddings; HuggingFaceEmbeddings(model_name='sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')"

4. OpenAI API Key (Optional but Recommended)

For automatic image description (Vision AI), you need an OpenAI API key.

Export it: export OPENAI_API_KEY="sk-..."
Or paste it into image_to_information.py (not recommended for production).

📂 Usage Workflow

Step 1: Ingest & Process (The GUI)

We have replaced the complex script chain with a single App.

Run the GUI:

python AskTheManual_GUI.py

The App will guide you through 3 stages:

Extraction & Review:
- Select your PDF. (it has to be in the main directory of the projectfolder)
- The app extracts all images.
- Interactive Review: A gallery appears. Use Arrow Keys to navigate. Press 'DELETE' to Delete junk images, 'KEEP' to Keep valid ones.
Enrichment:
- Vision AI (Auto but maybe not every description is correct): Sends kept images to OpenAI to generate detailed technical descriptions.
- Human Description (recommended - if you want to be sure everything is correct): If you don't have an API key, you can manually type descriptions for each image in the GUI.
Indexing:
- Click "Update Vector Index" to finalize the database.

Step 2: Chat with your Manual

Once indexing is complete, launch the chat interface:

streamlit run chatbot_dashboard.py

⚠️ Disclaimer

This PoC is intended for internal testing and demonstration. It serves as a blueprint for how technical documentation can be made "intelligent" by combining text parsers, Vision AI, and Vector Search.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
version_with_no_ui		version_with_no_ui
.gitignore		.gitignore
AskTheManual_GUI.py		AskTheManual_GUI.py
GUI.png		GUI.png
LICENSE		LICENSE
README.md		README.md
README_Deutsch.md		README_Deutsch.md
chatbot_dashboard_cloud_gpt.py		chatbot_dashboard_cloud_gpt.py
chatbot_dashboard_local_model.py		chatbot_dashboard_local_model.py
image_to_information.py		image_to_information.py
unified_extraction_review.py		unified_extraction_review.py
vector_transformer.py		vector_transformer.py
workflow_DE.svg		workflow_DE.svg
workflow_ENG.svg		workflow_ENG.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AskTheManual – A Multimodal RAG-PoC

What it is

Advantages

🛠️ Installation

1. Requirements

2. Setup Ollama (Local LLM)

3. Setup Embedding Model (Prevents GUI Freeze)

4. OpenAI API Key (Optional but Recommended)

📂 Usage Workflow

Step 1: Ingest & Process (The GUI)

Step 2: Chat with your Manual

⚠️ Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

KalaINC/AskTheManual

Folders and files

Latest commit

History

Repository files navigation

AskTheManual – A Multimodal RAG-PoC

What it is

Advantages

🛠️ Installation

1. Requirements

2. Setup Ollama (Local LLM)

3. Setup Embedding Model (Prevents GUI Freeze)

4. OpenAI API Key (Optional but Recommended)

📂 Usage Workflow

Step 1: Ingest & Process (The GUI)

Step 2: Chat with your Manual

⚠️ Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages