GeoAI Assistant Project

This is the Python backend for GeoAi assistant project, to run the project fully you need also to run the frontend web app written in Golang: go-webserver

Project Overview

The GeoAI Assistant is an AI-powered tool designed to help users query and interact with GIS data stored in De Vlaamse Waterweg nv's SQL Server environment. It additionally uses documents from the company's intranet to provide context and additional information to the end user.

The assistant multi-agent technology combines Retrieval-Augmented Generation (RAG) and Text-to-SQL technologies to enable business users to ask direct questions about geospatial data without requiring technical expertise in SQL or GIS systems.

Structure of the Project

├── app/                          # Main application code
│   ├── core/                     # Core application logic
│   │   ├── agentic_rag/          # Agentic RAG for document search and preparation
│   │   ├── knowledge_setup/      # Knowledge base initialization for DB and document search
│   │   ├── llm_models/           # LLM model initialization
│   │   └── text_to_sql/          # Text-to-SQL conversion implementation
│   ├── models/                   # Data Transfer Objects (DTOs) for requests and responses
│   ├── routers/                  # API routers (WebSockets and REST endpoints)
│   └── services/                 # Application services
├── documents/                    # Documents for semantic search
├── queries/                      # SQL queries for database initialization
├── semantic_search_documents/    # Initial implementation of document search
├── text_to_sql/                  # Initial implementation of text-to-SQL conversion
├── docker-compose.yml            # Docker Compose configuration for infrastructure
├── fill_sql_db.ps1               # PowerShell script to prepare Microsoft SQL Server
├── fill_sql_db.sh                # Shell script to prepare Microsoft SQL Server
└── requirements.txt              # Python dependencies

Background

De Vlaamse Waterweg nv manages and develops inland waterways in Flanders, focusing on transport, water management, and recreational use. The organization has a GIS team responsible for maintaining a GEO Platform that consolidates all GIS applications and geodata. The platform includes data on infrastructure, sustainability, water management, and navigation.

As the geodata grows, business users often struggle to determine whether existing datasets can solve their specific problems. The GeoAI Assistant aims to bridge this gap by allowing users to interact with the data through natural language queries, eliminating the need for manual SQL translations.

Features

RAG (Retrieval-Augmented Generation): Combines the strengths of retrieval-based and generative models to provide accurate and contextually relevant answers.
Text-to-SQL Conversion: Users can input natural language questions, which are converted into SQL that are sent to the database and executed
Document Search: Allows users to search for documents based on their content.
Geospatial Analysis: Supports queries involving location-based relationships such as proximity, containment, and connectivity.
Networked Data Navigation: Determines navigability based on ship dimensions and waterway network constraints.
Automated Report Generation: Provides clear, comprehensible answers and visualizations from GIS data.

Expected Outcomes

Improved accessibility and usability of GIS data for non-technical users.
Improved access to the database for non-technical users
Increased feedback and validation of frequently used geodata, enhancing data quality.
New insights through spatial and networked data relationships, enabling innovative decision-making.

How the text-to-SQL conversion works:

(START) The user inputs a question and select the database search

How the document search works:

To see the flowchart click here

Basically the entire workflow uses a multi-agent approach to perform the following steps:

(START) The user inputs a question and select the document search
Keywords are extracted from the question with an LLM and compared with the keywords of the all the documents in the database
The most relevant documents are retrieved
The question is embedded and a vector search is performed on all relevant documents
The result of the search are ranked from most to least relevant and the top 5 are used for the RAG (Retrieval-Augmented Generation) process
The answer gets generated by an LLM with the additional context of the top 5 documents
The answer gets validated by two LLMs for relevance and hallucinations
(END) The answer gets returned to the user

How the import of a group of documents in a project works:

To see the flowchart click here

It works like this:

(START) The user uploads a group of documents
For each document in the batch it is checked if the document is already in the database
The keywords are extracted from the document with an LLM using a parallel batch processing approach (every 1000 tokens) to improve performance and after they are saved in the database
- This is needed for the relevance check in the document search
the document is splitted into chunks and the chunks are embedded and stored in the database
- This is needed for the vector search later on (for the db search and document search)
(END) The user gets a confirmation that the documents have been imported

DB Schema of the project

This database is shared between the python backend and the golang web-application.

General explanation of the tables:

chat_states:
- contains state of a chat (last message, last answer, etc.)
chats
- contain session of a chat for a user
chunks
- contains the chunk of a content of a document
columns
- contains the columns of a table to be used in the database search (with a description of the content)
databases
- contains the database connection details to be used in a project
files
- contains the file imported in a project
files_projects
- contains the link between a file and a project (Many to Many)
messages
- contains the messages of a chat
projects
- contains the list of projects
rules
- contains the rules for a database to be used in the database search for a project
rules_databases
- contains the link between a rule and a database (Many to Many)
rules_schemas
- contains the rules for a schema to be used in the database search for a project
rules_tables
- contains the rules for a table to be used in the database search for a project
schemas
- contains the schemas to be used in the database search for a project
tables
- contains the tables to be used in the database search for a project
users
- contains a list of users present in the application
users_projects
- contains the link between a user and a project (Many to Many)

Requirements to run the project

Python > 3.11
Docker
Docker compose
ollama (if you use it as LLM provider)
- Follow the tutorial here
OBDC driver for mssql
- See this link for more information.
sqlcmd for mssql
- See this link for more information.

Setup env variables:

Look into the .env.example file for more details.

When you are ready, rename the file to .env

If you are going to use an openai model, you need to set the OPENAI_KEY variable with your openai key. If you are going to use a local model, you need to set the MODEL_PROVIDER variable to ollama.

Note: if you are going to use ollama, you need to have ollama installed and running.

How to run this python project

Run the docker compose file to pull the databases (source and metadata [postgresql, mssql, neo4j])

docker compose up -d

Create virtual environment

To enjoy best this project, please use the UV package/project manager:

or just pip to install the requirement

uv venv --python 3.11

or

python -m venv .venv

Activate it

source .venv/bin/activate

install dependencies

uv sync

or

pip install -r requirements.txt

Run the project

If you are running the project for the first time, you need to run the following command initialize the database:

python -m app.services.metadata_repository.metadata_setup

then to just run the project

uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload --log-level debug

After this you can access the project at http://127.0.0.1:8000

To run this project whole you need to run also this golang web-app for the frontend:

go-webserver

For development only

Setup source DB

Open Microsoft Sql Server Management or Azure Data Studio to connect to source datbase
Create and fill source server

Setup metadata repository

Note to do this you will need to have sqlcmd installed on your machine,

a log file will be created in the root of the project to see the progress of the script (fill_sql_db.log)

Linux/MacOS

chmod +x fill_sql_db.sh
./fill_sql_db.sh

Windows

Set-ExecutionPolicy Unrestricted -Scope Process
.\fill_sql_db.ps1

How to prepare database the neo4j database

  python -m app.services.graph_repository.parse_schema_to_graph

How to export data from all the databases:

# For neo4j

# stop the db
docker exec -it neo4j neo4j-admin server stop

# dump database
docker exec -it neo4j neo4j-admin database dump neo4j --to-path=/backups --overwrite-destination=true

# restart database
docker exec -it neo4j neo4j-admin server start


# For postgres

docker exec -it postgres pg_dump -U devuser -d devdb -F c -f /backups/postgres_backup.dump


# For microsoft sql server

sqlcmd -S localhost,1433 -U SA -P 'YourStrong!Passw0rd' -Q "BACKUP DATABASE [GeoData] TO DISK = N'/backups/mssql_backup.bak' WITH FORMAT, INIT"

How to import data to all the databases:

# For neo4j

# stop the db
docker exec -it neo4j neo4j-admin server stop

# load the Dump
docker exec -it neo4j neo4j-admin database load neo4j --from-path=/backups --overwrite-destination=true

# restart database
docker exec -it neo4j neo4j-admin server start


# For postgres
# drop old db
docker exec -it postgres psql -U devuser -d postgres -c "DROP DATABASE IF EXISTS devdb;"
docker exec -it postgres psql -U devuser -d postgres -c "CREATE DATABASE devdb;"
docker exec -it postgres pg_restore -U devuser -d devdb -F c /backups/postgres_backup.dump

# For microsoft sql server

# Step 1: Set the Database to Single-User Mode
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "ALTER DATABASE [GeoData] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;"

# Step 2: drop existing database
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "DROP DATABASE IF EXISTS [GeoData];"

# Step 3: Create a new database
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "CREATE DATABASE [GeoData];"

# Step 4: Restore the Database
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "RESTORE DATABASE [GeoData] FROM DISK = N'/backups/mssql_backup.bak' WITH REPLACE;"

# Step 5: Set the Database to Multi-User Mode
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "ALTER DATABASE [GeoData] SET MULTI_USER;"

License

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 230 Commits
app		app
assets		assets
evaluations		evaluations
img		img
queries		queries
semantic_search_documents		semantic_search_documents
setup		setup
text_to_sql		text_to_sql
utils		utils
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
fill_sql_db.ps1		fill_sql_db.ps1
fill_sql_db.sh		fill_sql_db.sh
geo_eval.py		geo_eval.py
init.sql		init.sql
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_api.py		test_api.py

License

KIE-KdG/assistant-backend

Folders and files

Latest commit

History

Repository files navigation

GeoAI Assistant Project

Project Overview

Structure of the Project

Background

Features

Expected Outcomes

How the text-to-SQL conversion works:

How the document search works:

How the import of a group of documents in a project works:

DB Schema of the project

General explanation of the tables:

Requirements to run the project

Setup env variables:

How to run this python project

Run the docker compose file to pull the databases (source and metadata [postgresql, mssql, neo4j])

Create virtual environment

Activate it

install dependencies

Run the project

If you are running the project for the first time, you need to run the following command initialize the database:

then to just run the project

After this you can access the project at http://127.0.0.1:8000

For development only

Setup source DB

Setup metadata repository

Linux/MacOS

Windows

How to prepare database the neo4j database

How to export data from all the databases:

How to import data to all the databases:

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages