NeuronQuery is a semantic search microservice that enables extraction of textual data from PDF documents, generates multilingual vector embeddings using state-of-the-art transformer models, and stores them efficiently in a PostgreSQL database with the pgvector extension. It exposes a REST API built on FastAPI for loading documents and performing semantic queries over the embedded content.
- PDF text extraction with chunking for optimized embedding
- Multilingual embedding generation using Huggingface Transformers
- Vector similarity search leveraging PostgreSQL and pgvector extension
- FastAPI REST API for document ingestion and semantic querying
- Docker Compose orchestration for PostgreSQL with pgvector support
- Suitable for local development or containerized deployment
- Python 3.9+
- FastAPI framework for API development
- Huggingface Transformers for embedding computation
- PostgreSQL with pgvector extension for vector search
- psycopg2 PostgreSQL adapter for Python
- Docker & Docker Compose for environment setup
- Uvicorn as the ASGI web server
Ensure you have the following installed:
- Docker and Docker Compose
- Python 3.9 or later
- pip package manager
Run the following commands to get the environment up and the API server running:
# 1. Start PostgreSQL with pgvector extension via Docker
docker-compose up -d pgvector
# 2. Create and activate Python virtual environment
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install required Python packages
pip install --upgrade pip
pip install -r requirements.txt
# 4. Start the FastAPI server
uvicorn src.api:app --host 0.0.0.0 --port 8000 --reloadSend a POST request to the /load-pdf endpoint with the path to your PDF file (adjust the file path as needed):
curl -X POST http://localhost:8000/load-pdf \
-H "Content-Type: application/json" \
-d '{"file_path": "data/raw/darksouls_guide.pdf"}'curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "How to defeat Ornstein and Smough?"}'