Paperclips Organized Your Documents Since 1867.
Now They Understand Them.
A text-centric multimodal RAG system with knowledge graph reasoning for sub-4B LLMs
Klippy is a local-first AI assistant that remembers everything. It combines:
- π Multimodal RAG ; Search across text, images, and audio with one query
- π§ Knowledge Graph Reasoning ; Neo4j-powered ontological reasoning for complex queries
- β‘ Prompt Chaining ; Google ADK agents orchestrate multi-step reasoning pipelines
- πΎ Personal Memory ; Your data stays local, your assistant gets smarter
Architecture Philosophy: "Graph-as-Brain, LLM-as-Mouth" ; All reasoning is pre-computed via deterministic graph logic. The LLM only narrates the answer. This allows sub-4B models like Qwen2.5:3b to perform like much larger models.
|
|
|
|
|
|
- Python 3.12+
- Docker Desktop (for Qdrant & Neo4j)
- Ollama (for local LLM)
- Node.js 18+ (for frontend)
- uv package manager (recommended)
git clone https://github.com/Rayen-Hamza/Klippy.git
cd Klippy
# Install Python dependencies
uv sync
# Download spaCy model
uv run python -m spacy download en_core_web_smdocker compose up -dThis starts:
- Qdrant on
localhost:6333(Vector DB) - Neo4j on
localhost:7474(Graph DB, password:changeme)
# In a separate terminal (or it may already be running)
ollama serve
# Pull the model (first time only)
ollama pull qwen2.5:3buv run uvicorn app.main:app --host 0.0.0.0 --port 8000π API docs: http://localhost:8000/docs
cd frontend
npm install
npm start| Endpoint | Method | Description |
|---|---|---|
/ingest/text |
POST | Upload text/PDF/markdown |
/ingest/image |
POST | Upload image β caption + OCR |
/ingest/audio |
POST | Upload audio β transcribe |
/ingest/directory |
POST | Batch ingest from path |
| Endpoint | Method | Description |
|---|---|---|
/search |
POST | Unified search ; all modalities |
/search/by-type/{type} |
POST | Filter by text/image/audio |
/search/filters/by-entity |
GET | Find content by entity |
| Endpoint | Method | Description |
|---|---|---|
/agent/chat |
POST | Chat with orchestrator agent |
/agent/sessions |
GET | List active sessions |
/agent/agents |
GET | List available agents |
| Endpoint | Method | Description |
|---|---|---|
/reasoning/query |
POST | Graph reasoning β LLM prompt |
/reasoning/ingest |
POST | Ingest document to graph |
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=app
# Run specific test
uv run pytest tests/test_embeddings.py -vKlippy/
βββ app/ # FastAPI backend
β βββ agents/ # Google ADK agents
β β βββ orchestrator.py # Root agent (router)
β β βββ qdrant_agent.py # Vector search specialist
β β βββ neo4j_agent.py # Knowledge graph specialist
β β βββ prompt_chain.py # Prompt chaining pipeline
β βββ models/ # Pydantic models
β βββ routes/ # API endpoints
β βββ services/ # Business logic
β β βββ embeddings/ # Embedding strategies
β β βββ processing/ # Text/Image/Audio processors
β β βββ storage/ # Qdrant manager
β βββ config.py # Settings
βββ frontend/ # Electron desktop app
β βββ src/
β βββ main/ # Electron main process
β βββ renderer/ # React UI
βββ tests/ # pytest tests
βββ docker-compose.yml # Qdrant + Neo4j
βββ pyproject.toml # Python dependencies
Create a .env file:
# Qdrant
QDRANT_HOST=localhost
QDRANT_PORT=6333
# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=changeme
# Ollama (Local LLM)
LLM_PROVIDER=ollama
LLM_MODEL=qwen2.5:3b
LLM_BASE_URL=http://localhost:11434/v1
# Processing
TEXT_CHUNK_SIZE=512
TEXT_CHUNK_OVERLAP=50
LOG_LEVEL=INFO|
CrΓ¨meTartinΓ©Dangereuse Built with π and β |
MIT License ; see LICENSE for details.
β Star this repo if you find it useful!

