Skip to content

KIE-KdG/assistant-backend

Repository files navigation

GeoAI Assistant Project

This is the Python backend for GeoAi assistant project, to run the project fully you need also to run the frontend web app written in Golang: go-webserver

Project Overview

The GeoAI Assistant is an AI-powered tool designed to help users query and interact with GIS data stored in De Vlaamse Waterweg nv's SQL Server environment. It additionally uses documents from the company's intranet to provide context and additional information to the end user.

The assistant multi-agent technology combines Retrieval-Augmented Generation (RAG) and Text-to-SQL technologies to enable business users to ask direct questions about geospatial data without requiring technical expertise in SQL or GIS systems.

Structure of the Project

├── app/                          # Main application code
│   ├── core/                     # Core application logic
│   │   ├── agentic_rag/          # Agentic RAG for document search and preparation
│   │   ├── knowledge_setup/      # Knowledge base initialization for DB and document search
│   │   ├── llm_models/           # LLM model initialization
│   │   └── text_to_sql/          # Text-to-SQL conversion implementation
│   ├── models/                   # Data Transfer Objects (DTOs) for requests and responses
│   ├── routers/                  # API routers (WebSockets and REST endpoints)
│   └── services/                 # Application services
├── documents/                    # Documents for semantic search
├── queries/                      # SQL queries for database initialization
├── semantic_search_documents/    # Initial implementation of document search
├── text_to_sql/                  # Initial implementation of text-to-SQL conversion
├── docker-compose.yml            # Docker Compose configuration for infrastructure
├── fill_sql_db.ps1               # PowerShell script to prepare Microsoft SQL Server
├── fill_sql_db.sh                # Shell script to prepare Microsoft SQL Server
└── requirements.txt              # Python dependencies

Background

De Vlaamse Waterweg nv manages and develops inland waterways in Flanders, focusing on transport, water management, and recreational use. The organization has a GIS team responsible for maintaining a GEO Platform that consolidates all GIS applications and geodata. The platform includes data on infrastructure, sustainability, water management, and navigation.

As the geodata grows, business users often struggle to determine whether existing datasets can solve their specific problems. The GeoAI Assistant aims to bridge this gap by allowing users to interact with the data through natural language queries, eliminating the need for manual SQL translations.

Features

  • RAG (Retrieval-Augmented Generation): Combines the strengths of retrieval-based and generative models to provide accurate and contextually relevant answers.
  • Text-to-SQL Conversion: Users can input natural language questions, which are converted into SQL that are sent to the database and executed
  • Document Search: Allows users to search for documents based on their content.
  • Geospatial Analysis: Supports queries involving location-based relationships such as proximity, containment, and connectivity.
  • Networked Data Navigation: Determines navigability based on ship dimensions and waterway network constraints.
  • Automated Report Generation: Provides clear, comprehensible answers and visualizations from GIS data.

Expected Outcomes

  • Improved accessibility and usability of GIS data for non-technical users.
  • Improved access to the database for non-technical users
  • Increased feedback and validation of frequently used geodata, enhancing data quality.
  • New insights through spatial and networked data relationships, enabling innovative decision-making.

How the text-to-SQL conversion works:

  • (START) The user inputs a question and select the database search

How the document search works:

To see the flowchart click here

Basically the entire workflow uses a multi-agent approach to perform the following steps:

  1. (START) The user inputs a question and select the document search
  2. Keywords are extracted from the question with an LLM and compared with the keywords of the all the documents in the database
  3. The most relevant documents are retrieved
  4. The question is embedded and a vector search is performed on all relevant documents
  5. The result of the search are ranked from most to least relevant and the top 5 are used for the RAG (Retrieval-Augmented Generation) process
  6. The answer gets generated by an LLM with the additional context of the top 5 documents
  7. The answer gets validated by two LLMs for relevance and hallucinations
  8. (END) The answer gets returned to the user

How the import of a group of documents in a project works:

To see the flowchart click here

It works like this:

  1. (START) The user uploads a group of documents
  2. For each document in the batch it is checked if the document is already in the database
  3. The keywords are extracted from the document with an LLM using a parallel batch processing approach (every 1000 tokens) to improve performance and after they are saved in the database
    • This is needed for the relevance check in the document search
  4. the document is splitted into chunks and the chunks are embedded and stored in the database
    • This is needed for the vector search later on (for the db search and document search)
  5. (END) The user gets a confirmation that the documents have been imported

DB Schema of the project

This database is shared between the python backend and the golang web-application.

DB Schema

General explanation of the tables:

  • chat_states:
    • contains state of a chat (last message, last answer, etc.)
  • chats
    • contain session of a chat for a user
  • chunks
    • contains the chunk of a content of a document
  • columns
    • contains the columns of a table to be used in the database search (with a description of the content)
  • databases
    • contains the database connection details to be used in a project
  • files
    • contains the file imported in a project
  • files_projects
    • contains the link between a file and a project (Many to Many)
  • messages
    • contains the messages of a chat
  • projects
    • contains the list of projects
  • rules
    • contains the rules for a database to be used in the database search for a project
  • rules_databases
    • contains the link between a rule and a database (Many to Many)
  • rules_schemas
    • contains the rules for a schema to be used in the database search for a project
  • rules_tables
    • contains the rules for a table to be used in the database search for a project
  • schemas
    • contains the schemas to be used in the database search for a project
  • tables
    • contains the tables to be used in the database search for a project
  • users
    • contains a list of users present in the application
  • users_projects
    • contains the link between a user and a project (Many to Many)

Requirements to run the project

  • Python > 3.11
  • Docker
  • Docker compose
  • ollama (if you use it as LLM provider)
    • Follow the tutorial here
  • OBDC driver for mssql
    • See this link for more information.
  • sqlcmd for mssql
    • See this link for more information.

Setup env variables:

Look into the .env.example file for more details.

When you are ready, rename the file to .env

If you are going to use an openai model, you need to set the OPENAI_KEY variable with your openai key. If you are going to use a local model, you need to set the MODEL_PROVIDER variable to ollama.

Note: if you are going to use ollama, you need to have ollama installed and running.

How to run this python project

Run the docker compose file to pull the databases (source and metadata [postgresql, mssql, neo4j])

docker compose up -d

Create virtual environment

To enjoy best this project, please use the UV package/project manager:

or just pip to install the requirement

uv venv --python 3.11

or

python -m venv .venv

Activate it

source .venv/bin/activate

install dependencies

uv sync

or

pip install -r requirements.txt

Run the project

If you are running the project for the first time, you need to run the following command initialize the database:

python -m app.services.metadata_repository.metadata_setup

then to just run the project

uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload --log-level debug

After this you can access the project at http://127.0.0.1:8000

To run this project whole you need to run also this golang web-app for the frontend:

go-webserver


For development only

Setup source DB

  1. Open Microsoft Sql Server Management or Azure Data Studio to connect to source datbase
  2. Create and fill source server

Setup metadata repository

Note to do this you will need to have sqlcmd installed on your machine,

a log file will be created in the root of the project to see the progress of the script (fill_sql_db.log)

Linux/MacOS

chmod +x fill_sql_db.sh
./fill_sql_db.sh

Windows

Set-ExecutionPolicy Unrestricted -Scope Process
.\fill_sql_db.ps1

How to prepare database the neo4j database

  python -m app.services.graph_repository.parse_schema_to_graph

How to export data from all the databases:

# For neo4j

# stop the db
docker exec -it neo4j neo4j-admin server stop

# dump database
docker exec -it neo4j neo4j-admin database dump neo4j --to-path=/backups --overwrite-destination=true

# restart database
docker exec -it neo4j neo4j-admin server start


# For postgres

docker exec -it postgres pg_dump -U devuser -d devdb -F c -f /backups/postgres_backup.dump


# For microsoft sql server

sqlcmd -S localhost,1433 -U SA -P 'YourStrong!Passw0rd' -Q "BACKUP DATABASE [GeoData] TO DISK = N'/backups/mssql_backup.bak' WITH FORMAT, INIT"

How to import data to all the databases:

# For neo4j

# stop the db
docker exec -it neo4j neo4j-admin server stop

# load the Dump
docker exec -it neo4j neo4j-admin database load neo4j --from-path=/backups --overwrite-destination=true

# restart database
docker exec -it neo4j neo4j-admin server start


# For postgres
# drop old db
docker exec -it postgres psql -U devuser -d postgres -c "DROP DATABASE IF EXISTS devdb;"
docker exec -it postgres psql -U devuser -d postgres -c "CREATE DATABASE devdb;"
docker exec -it postgres pg_restore -U devuser -d devdb -F c /backups/postgres_backup.dump

# For microsoft sql server

# Step 1: Set the Database to Single-User Mode
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "ALTER DATABASE [GeoData] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;"

# Step 2: drop existing database
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "DROP DATABASE IF EXISTS [GeoData];"

# Step 3: Create a new database
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "CREATE DATABASE [GeoData];"

# Step 4: Restore the Database
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "RESTORE DATABASE [GeoData] FROM DISK = N'/backups/mssql_backup.bak' WITH REPLACE;"

# Step 5: Set the Database to Multi-User Mode
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "ALTER DATABASE [GeoData] SET MULTI_USER;"

License

This project is licensed under the MIT License - see the LICENSE file for details

About

The GeoAssistant backend with Text2SQL logic

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •