Skip to content

Semantic search microservice for PDF documents using PostgreSQL pgvector and transformer embeddings. Built with FastAPI and Docker.

Notifications You must be signed in to change notification settings

devfurkankizmaz/neuronquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NeuronQuery — Semantic Search Microservice

NeuronQuery is a semantic search microservice that enables extraction of textual data from PDF documents, generates multilingual vector embeddings using state-of-the-art transformer models, and stores them efficiently in a PostgreSQL database with the pgvector extension. It exposes a REST API built on FastAPI for loading documents and performing semantic queries over the embedded content.


Features

  • PDF text extraction with chunking for optimized embedding
  • Multilingual embedding generation using Huggingface Transformers
  • Vector similarity search leveraging PostgreSQL and pgvector extension
  • FastAPI REST API for document ingestion and semantic querying
  • Docker Compose orchestration for PostgreSQL with pgvector support
  • Suitable for local development or containerized deployment

Technology Stack

  • Python 3.9+
  • FastAPI framework for API development
  • Huggingface Transformers for embedding computation
  • PostgreSQL with pgvector extension for vector search
  • psycopg2 PostgreSQL adapter for Python
  • Docker & Docker Compose for environment setup
  • Uvicorn as the ASGI web server

Setup and Usage

Prerequisites

Ensure you have the following installed:

  • Docker and Docker Compose
  • Python 3.9 or later
  • pip package manager

Start Commands

Run the following commands to get the environment up and the API server running:

# 1. Start PostgreSQL with pgvector extension via Docker
docker-compose up -d pgvector

# 2. Create and activate Python virtual environment
python3 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# 3. Install required Python packages
pip install --upgrade pip
pip install -r requirements.txt

# 4. Start the FastAPI server
uvicorn src.api:app --host 0.0.0.0 --port 8000 --reload

Usage Examples with Curl

Load and embed a PDF document

Send a POST request to the /load-pdf endpoint with the path to your PDF file (adjust the file path as needed):

curl -X POST http://localhost:8000/load-pdf \
  -H "Content-Type: application/json" \
  -d '{"file_path": "data/raw/darksouls_guide.pdf"}'
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "How to defeat Ornstein and Smough?"}'

About

Semantic search microservice for PDF documents using PostgreSQL pgvector and transformer embeddings. Built with FastAPI and Docker.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published