This is the Python backend for GeoAi assistant project, to run the project fully you need also to run the frontend web app written in Golang: go-webserver
The GeoAI Assistant is an AI-powered tool designed to help users query and interact with GIS data stored in De Vlaamse Waterweg nv's SQL Server environment. It additionally uses documents from the company's intranet to provide context and additional information to the end user.
The assistant multi-agent technology combines Retrieval-Augmented Generation (RAG) and Text-to-SQL technologies to enable business users to ask direct questions about geospatial data without requiring technical expertise in SQL or GIS systems.
├── app/ # Main application code
│ ├── core/ # Core application logic
│ │ ├── agentic_rag/ # Agentic RAG for document search and preparation
│ │ ├── knowledge_setup/ # Knowledge base initialization for DB and document search
│ │ ├── llm_models/ # LLM model initialization
│ │ └── text_to_sql/ # Text-to-SQL conversion implementation
│ ├── models/ # Data Transfer Objects (DTOs) for requests and responses
│ ├── routers/ # API routers (WebSockets and REST endpoints)
│ └── services/ # Application services
├── documents/ # Documents for semantic search
├── queries/ # SQL queries for database initialization
├── semantic_search_documents/ # Initial implementation of document search
├── text_to_sql/ # Initial implementation of text-to-SQL conversion
├── docker-compose.yml # Docker Compose configuration for infrastructure
├── fill_sql_db.ps1 # PowerShell script to prepare Microsoft SQL Server
├── fill_sql_db.sh # Shell script to prepare Microsoft SQL Server
└── requirements.txt # Python dependencies
De Vlaamse Waterweg nv manages and develops inland waterways in Flanders, focusing on transport, water management, and recreational use. The organization has a GIS team responsible for maintaining a GEO Platform that consolidates all GIS applications and geodata. The platform includes data on infrastructure, sustainability, water management, and navigation.
As the geodata grows, business users often struggle to determine whether existing datasets can solve their specific problems. The GeoAI Assistant aims to bridge this gap by allowing users to interact with the data through natural language queries, eliminating the need for manual SQL translations.
- RAG (Retrieval-Augmented Generation): Combines the strengths of retrieval-based and generative models to provide accurate and contextually relevant answers.
- Text-to-SQL Conversion: Users can input natural language questions, which are converted into SQL that are sent to the database and executed
- Document Search: Allows users to search for documents based on their content.
- Geospatial Analysis: Supports queries involving location-based relationships such as proximity, containment, and connectivity.
- Networked Data Navigation: Determines navigability based on ship dimensions and waterway network constraints.
- Automated Report Generation: Provides clear, comprehensible answers and visualizations from GIS data.
- Improved accessibility and usability of GIS data for non-technical users.
- Improved access to the database for non-technical users
- Increased feedback and validation of frequently used geodata, enhancing data quality.
- New insights through spatial and networked data relationships, enabling innovative decision-making.
- (START) The user inputs a question and select the database search
To see the flowchart click here
Basically the entire workflow uses a multi-agent approach to perform the following steps:
- (START) The user inputs a question and select the document search
- Keywords are extracted from the question with an LLM and compared with the keywords of the all the documents in the database
- The most relevant documents are retrieved
- The question is embedded and a vector search is performed on all relevant documents
- The result of the search are ranked from most to least relevant and the top 5 are used for the RAG (Retrieval-Augmented Generation) process
- The answer gets generated by an LLM with the additional context of the top 5 documents
- The answer gets validated by two LLMs for relevance and hallucinations
- (END) The answer gets returned to the user
To see the flowchart click here
It works like this:
- (START) The user uploads a group of documents
- For each document in the batch it is checked if the document is already in the database
- The keywords are extracted from the document with an LLM using a parallel batch processing approach (every 1000 tokens) to improve performance and after they are saved in the database
- This is needed for the relevance check in the document search
- the document is splitted into chunks and the chunks are embedded and stored in the database
- This is needed for the vector search later on (for the db search and document search)
- (END) The user gets a confirmation that the documents have been imported
This database is shared between the python backend and the golang web-application.
- chat_states:
- contains state of a chat (last message, last answer, etc.)
- chats
- contain session of a chat for a user
- chunks
- contains the chunk of a content of a document
- columns
- contains the columns of a table to be used in the database search (with a description of the content)
- databases
- contains the database connection details to be used in a project
- files
- contains the file imported in a project
- files_projects
- contains the link between a file and a project (Many to Many)
- messages
- contains the messages of a chat
- projects
- contains the list of projects
- rules
- contains the rules for a database to be used in the database search for a project
- rules_databases
- contains the link between a rule and a database (Many to Many)
- rules_schemas
- contains the rules for a schema to be used in the database search for a project
- rules_tables
- contains the rules for a table to be used in the database search for a project
- schemas
- contains the schemas to be used in the database search for a project
- tables
- contains the tables to be used in the database search for a project
- users
- contains a list of users present in the application
- users_projects
- contains the link between a user and a project (Many to Many)
- Python > 3.11
- Docker
- Docker compose
- ollama (if you use it as LLM provider)
- Follow the tutorial here
- OBDC driver for mssql
- See this link for more information.
- sqlcmd for mssql
- See this link for more information.
Look into the .env.example file for more details.
When you are ready, rename the file to .env
If you are going to use an openai model, you need to set the OPENAI_KEY variable with your openai key.
If you are going to use a local model, you need to set the MODEL_PROVIDER variable to ollama.
Note: if you are going to use ollama, you need to have ollama installed and running.
docker compose up -d
To enjoy best this project, please use the UV package/project manager:
or just pip to install the requirement
uv venv --python 3.11or
python -m venv .venvsource .venv/bin/activateuv syncor
pip install -r requirements.txt
If you are running the project for the first time, you need to run the following command initialize the database:
python -m app.services.metadata_repository.metadata_setupuvicorn app.main:app --host 127.0.0.1 --port 8000 --reload --log-level debugAfter this you can access the project at http://127.0.0.1:8000
To run this project whole you need to run also this golang web-app for the frontend:
- Open Microsoft Sql Server Management or Azure Data Studio to connect to source datbase
- Create and fill source server
Note to do this you will need to have sqlcmd installed on your machine,
a log file will be created in the root of the project to see the progress of the script (fill_sql_db.log)
chmod +x fill_sql_db.sh
./fill_sql_db.sh
Set-ExecutionPolicy Unrestricted -Scope Process
.\fill_sql_db.ps1
python -m app.services.graph_repository.parse_schema_to_graph# For neo4j
# stop the db
docker exec -it neo4j neo4j-admin server stop
# dump database
docker exec -it neo4j neo4j-admin database dump neo4j --to-path=/backups --overwrite-destination=true
# restart database
docker exec -it neo4j neo4j-admin server start
# For postgres
docker exec -it postgres pg_dump -U devuser -d devdb -F c -f /backups/postgres_backup.dump
# For microsoft sql server
sqlcmd -S localhost,1433 -U SA -P 'YourStrong!Passw0rd' -Q "BACKUP DATABASE [GeoData] TO DISK = N'/backups/mssql_backup.bak' WITH FORMAT, INIT"
# For neo4j
# stop the db
docker exec -it neo4j neo4j-admin server stop
# load the Dump
docker exec -it neo4j neo4j-admin database load neo4j --from-path=/backups --overwrite-destination=true
# restart database
docker exec -it neo4j neo4j-admin server start
# For postgres
# drop old db
docker exec -it postgres psql -U devuser -d postgres -c "DROP DATABASE IF EXISTS devdb;"
docker exec -it postgres psql -U devuser -d postgres -c "CREATE DATABASE devdb;"
docker exec -it postgres pg_restore -U devuser -d devdb -F c /backups/postgres_backup.dump
# For microsoft sql server
# Step 1: Set the Database to Single-User Mode
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "ALTER DATABASE [GeoData] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;"
# Step 2: drop existing database
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "DROP DATABASE IF EXISTS [GeoData];"
# Step 3: Create a new database
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "CREATE DATABASE [GeoData];"
# Step 4: Restore the Database
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "RESTORE DATABASE [GeoData] FROM DISK = N'/backups/mssql_backup.bak' WITH REPLACE;"
# Step 5: Set the Database to Multi-User Mode
sqlcmd -S localhost -U SA -P 'YourStrong!Passw0rd' -Q "ALTER DATABASE [GeoData] SET MULTI_USER;"
This project is licensed under the MIT License - see the LICENSE file for details
