Production-Ready Conversational AI for Smart Buildings
A production-ready, end-to-end platform for human–building conversation: Rasa (NLU + Actions), robust analytics microservices, SQL/SPARQL knowledge stores, and a web UI—all orchestrated with Docker Compose.
Quick Start • Documentation • Buildings • Services • API Reference
Last updated: 2025-10-08
Status: Production Ready | Multi-Building Support | 1,606 Total Sensors
- What is OntoBot?
- 🏢 Buildings
- 🚀 Quick Start
- 🔧 Services & Architecture
- 📚 Documentation
- ⚙️ Configuration
- 🔬 Development
- 🧪 Testing
- 🐛 Troubleshooting
- 🗺️ Roadmap
- 📖 References
- 📄 License
- ✅ Typo-Tolerant Sensor Resolution - Automatic correction of sensor name typos, spaces, and formatting errors using fuzzy matching
- ✅ Multi-building dataset generation & merge tooling (bldg1, bldg2, bldg3) with raw + deduplicated outputs and provenance
- ✅ Hardened T5 fine-tuning (checkpoint-3) powering the
nl2sparqlmicroservice; legacy checkpoint-2 removed - ✅ Automatic Ollama (Mistral) model auto-pull & warm-up on container start
- ✅ Health endpoints to critical services (nl2sparql, analytics, decider, Rasa, file server) for compose reliability
- ✅ Professional documentation with GitHub Pages support and comprehensive service READMEs
- ✅ T5 Training GUI for incremental model fine-tuning with web interface
OntoBot is an intelligent conversational AI platform designed for smart buildings. It combines:
- Natural Language Understanding (Rasa) - Understand user questions about building systems
- Knowledge Graphs (Brick Schema + SPARQL) - Semantic representation of building equipment and sensors
- Time-Series Analytics - Statistical analysis, anomaly detection, forecasting
- Multi-Database Support - MySQL, PostgreSQL/TimescaleDB, Cassandra
- T5 NL2SPARQL Translation - Convert natural language to SPARQL queries using BrickSchema
- LLM Integration (Mistral/Ollama) - Natural language response generation
- Web UI (React) - User-friendly chat interface with rich visualizations
🎯 Multi-Building Support: Three example buildings (2 synthetic) with 1,606 total sensors
📊 Advanced Analytics: 30+ analytics types including forecasting and anomaly detection
🔍 Semantic Queries: SPARQL-powered knowledge graph exploration
📈 Real-Time Monitoring: Live sensor data with visualization
🤖 Conversational Interface: Natural language queries in plain English
🧠 Typo-Tolerant: Automatic correction of sensor name errors using fuzzy matching
🏗️ Modular Architecture: Microservices-based, easily extensible
🐳 Docker-First: Complete orchestration with Docker Compose
📚 Comprehensive Documentation: Professional README files and GitHub Pages
The OntoBot web interface provides an intuitive chat-based interaction for querying building systems and viewing analytics results.
OntoBot supports three example smart buildings with different characteristics and use cases.
| Building | Type | Sensors | Database | Focus Area | Documentation |
|---|---|---|---|---|---|
| Building 1 (ABACWS) | Real University Testbed | 680 | MySQL | Indoor Environmental Quality (IEQ) | 📖 README |
| Building 2 (Office) | Synthetic Commercial | 329 | TimescaleDB | HVAC & Thermal Comfort | 📖 README |
| Building 3 (Data Center) | Synthetic Critical Infrastructure | 597 | Cassandra | Cooling & Alarms | 📖 README |
Total System Coverage: 1,606 sensors across all buildings
- Location: Cardiff University, Wales, UK
- Sensors: 680 sensors across 34 zones (5.01–5.34)
- Sensor Types: CO2, TVOC, Formaldehyde, PM1/2.5/10, Multi-Gas (MQ2/3/5/9), Temperature, Humidity, Light, Sound
- Database: MySQL (port 3307)
- Compose File:
docker-compose.bldg1.yml - Details: Building 1 README | Building 1 Data
- Purpose: HVAC Optimization & Thermal Comfort Research
- Sensors: 329 sensors (15 AHUs, 50 zones, 3 chillers, 2 boilers)
- Sensor Types: AHU supply/return temps, zone temperatures, setpoints, occupancy, flow rates, pressures
- Database: TimescaleDB (PostgreSQL + time-series extension, port 5433)
- Compose File:
docker-compose.bldg2.yml - Details: Building 2 README | Building 2 Data
- Purpose: Critical Infrastructure & Cooling Optimization
- Sensors: 597 sensors (CRAC units, UPS, PDUs, rack temps, alarms)
- Sensor Types: Critical cooling, power distribution, environmental monitoring, alarm sensors
- Database: Cassandra (port 9042) + PostgreSQL for metadata (port 5434)
- Compose File:
docker-compose.bldg3.yml - Details: Building 3 README | Building 3 Data
OntoBot runs one building at a time to avoid port conflicts:
# Stop current building
docker-compose -f docker-compose.bldg1.yml down
# Start different building
docker-compose -f docker-compose.bldg2.yml up -d --build
# Frontend auto-detects the active buildingNote: The frontend and services automatically detect which building is active. No code changes needed!
For detailed multi-building workflows, see Multi-Building Support.
- Docker Desktop 20.10+
- Docker Compose 2.0+
- 8GB RAM minimum (16GB recommended)
- 20GB free disk space
Detailed Prerequisites: See Prerequisites section below.
# Clone repository
git clone --recursive https://github.com/suhasdevmane/OntoBot.git
cd OntoBot
# Start Building 1 stack
docker-compose -f docker-compose.bldg1.yml up -d --build
# Wait for services (~2-3 minutes)
Start-Sleep -Seconds 180
# Access the UI
# Frontend: http://localhost:3000
# Rasa: http://localhost:5005/versionOpen http://localhost:3000 and ask:
What is the temperature in zone 5.04?
Show me CO2 levels for the last hour
Detect anomalies in humidity
# Building 1 + AI services
docker-compose -f docker-compose.bldg1.yml -f docker-compose.extras.yml up -d --buildFor detailed getting started guide: 📖 Quick Start Guide
| Service | Port | Purpose | Health Check | Documentation |
|---|---|---|---|---|
| Rasa Core | 5005 | NLU/Dialogue Engine | GET /version |
Rasa Docs |
| Action Server | 5055 | Custom Business Logic | GET /health |
See building READMEs |
| Analytics Microservices | 6001 | Time-Series Analytics | GET /health |
📖 README |
| Decider Service | 6009 | Analytics Selection | GET /health |
📖 README |
| NL2SPARQL (T5) | 6005 | NL→SPARQL Translation | GET /health |
📖 README |
| Ollama/Mistral | 11434 | LLM Summarization | GET /api/tags |
📖 README |
| File Server | 8080 | Artifact Hosting | GET /health |
See building READMEs |
| Frontend (React) | 3000 | Chat UI | N/A | 📖 README |
| Duckling | 8000 | Entity Extraction | GET / |
Duckling Docs |
| Jena Fuseki | 3030 | SPARQL Endpoint | GET /$/ping |
Fuseki Docs |
| Service | Port | Used By | Purpose |
|---|---|---|---|
| MySQL | 3307 | Building 1 | Telemetry storage |
| TimescaleDB | 5433 | Building 2 | Time-series optimized PostgreSQL |
| Cassandra | 9042 | Building 3 | Distributed NoSQL for critical data |
| PostgreSQL | 5434 | Building 3 | ThingsBoard metadata |
| Service | Port | Purpose | Enabled By |
|---|---|---|---|
| Jupyter Lab | 8888 | Notebooks | docker-compose.extras.yml |
| GraphDB | 7200 | Alternate RDF Store | docker-compose.extras.yml |
| Adminer | 8282 | Database UI | docker-compose.extras.yml |
| ThingsBoard | 8082 | IoT Platform | Building 2/3 compose files |
| pgAdmin | 5050/5051 | PostgreSQL Management | All building compose files |
flowchart TB
User[👤 User Browser<br/>:3000] --> FE[React Frontend<br/>:3000]
FE -->|/webhook| Rasa[Rasa NLU Server<br/>:5005]
Rasa --> Actions[Action Server<br/>:5055]
Actions -->|NL2SPARQL_URL| NL2S[NL2SPARQL Service<br/>T5 Model :6005]
Actions -->|SPARQL Queries| Fuseki[Jena Fuseki<br/>Triple Store :3030]
Actions -->|Analytics Decision| Decider[Decider Service<br/>:6009]
Actions -->|SQL Queries| DB[(Database<br/>Building-Specific)]
Actions -->|/analytics/run| Analytics[Analytics Microservices<br/>30+ Analysis Types :6001]
Actions -->|Read/Write Artifacts| FS[(File Server<br/>:8080)]
Analytics -->|Save Results| FS
Decider -->|Training Data| TrainFS[(Training Files)]
NL2S -->|SPARQL Translation| Actions
Ollama[Ollama/Mistral<br/>LLM :11434] -->|Summarization| Actions
FS -->|Serve Artifacts| FE
FileSrv[HTTP File Server<br/>:8080] --> FE
DB -.->|MySQL :3307| BLD1[Building 1<br/>ABACWS]
DB -.->|TimescaleDB :5433| BLD2[Building 2<br/>Office]
DB -.->|Cassandra :9042| BLD3[Building 3<br/>Data Center]
subgraph Docker_Network[Docker Network: ontobot-network]
Rasa
Actions
Analytics
Decider
NL2S
Ollama
Fuseki
DB
FileSrv
FS
end
subgraph Optional_Services[Optional Services - docker-compose.extras.yml]
NL2S
Ollama
GraphDB[GraphDB<br/>:7200]
Jupyter[Jupyter Lab<br/>:8888]
Adminer[Adminer<br/>:8282]
end
style User fill:#e1f5ff
style FE fill:#4a90e2
style Rasa fill:#5a17ee
style Actions fill:#ff6b6b
style Analytics fill:#51cf66
style Decider fill:#ffd43b
style NL2S fill:#845ef7
style Ollama fill:#ff8787
style Fuseki fill:#20c997
style DB fill:#868e96
style Docker_Network fill:#f1f3f5,stroke:#868e96,stroke-width:2px
style Optional_Services fill:#fff3bf,stroke:#ffd43b,stroke-width:2px,stroke-dasharray: 5 5
For detailed architecture: See Architecture at a glance section below.
Comprehensive documentation is available on GitHub Pages:
| Document | Description | Link |
|---|---|---|
| Frontend UI Guide | Complete UI reference, all pages, keyboard shortcuts | 📖 Read |
| T5 Training Guide | Model training workflow, GUI usage, API endpoints | 📖 Read |
| Backend Services | All 8 services, API references, integration patterns | 📖 Read |
| Multi-Building Guide | Building differences, switching, portability | 📖 Read |
| Quick Start Guide | 30-minute setup, first conversation, common issues | 📖 Read |
| API Reference | Complete REST API specs with examples | 📖 Read |
| Service | Description | Documentation |
|---|---|---|
| Building 1 (ABACWS) | Real testbed with 680 sensors | 📖 README |
| Building 2 (Office) | Synthetic office with 329 sensors | 📖 README |
| Building 3 (Data Center) | Synthetic data center with 597 sensors | 📖 README |
| Analytics Microservices | Time-series analytics API (30+ analyses) | 📖 README |
| Decider Service | Analytics decision logic & training | 📖 README |
| NL2SPARQL & Ollama | Language translation & summarization | 📖 README |
| Frontend (React) | Chat UI and configuration | 📖 README |
| Document | Purpose | Link |
|---|---|---|
| Multi-Building Support | Switching buildings, portability | MULTI_BUILDING_SUPPORT.md |
| Typo-Tolerant Sensors | Automatic sensor name correction | rasa-bldg1/TYPO_TOLERANT_SENSORS.md |
| Buildings Taxonomy | Detailed building characteristics | BUILDINGS.md |
| Port Reference | Complete port mapping | PORTS.md |
| Analytics Deep Dive | Analytics API details | analytics.md |
| Setup Checklist | Deployment checklist | SETUP_CHECKLIST.md |
| Models Documentation | T5 and Decider models | MODELS.md |
| T5 GUI Architecture | Training GUI design | T5_GUI_ARCHITECTURE.md |
| Port Fix Documentation | Chrome ERR_UNSAFE_PORT fix | T5_TRAINING_PORT_FIX.md |
This repo has dedicated compose files for three distinct smart buildings plus an overlay for additional tooling. Language translation (nl2sparql) and local summarization (ollama) are pluggable feature services (can be hosted separately for quick responses).
-
docker-compose.bldg1.yml- Building 1 (ABACWS)- Real university testbed with MySQL
- Services: Rasa, Actions, Duckling, Editor, File Server, Microservices, Decider, MySQL, Fuseki, pgAdmin
- Documentation: Building 1 README
-
docker-compose.bldg2.yml- Building 2 (Office)- Synthetic office with TimescaleDB
- Services: Same as bldg1 + TimescaleDB, ThingsBoard
- Pre-configured pgAdmin via
bldg2/servers.json - Documentation: Building 2 README
-
docker-compose.bldg3.yml- Building 3 (Data Center)- Synthetic data center with Cassandra
- Services: Same as bldg1 + Cassandra, PostgreSQL, ThingsBoard
- Documentation: Building 3 README
docker-compose.extras.yml- Additional Servicesnl2sparql(6005) - T5-based NL→SPARQL translatorollama(11434) - Mistral LLM for summarizationgraphdb(7200) - Alternate RDF storejupyter(8888) - Notebooks for explorationadminer(8282) - Database UI- Documentation: Transformers README
# Building 1 only
docker-compose -f docker-compose.bldg1.yml up -d --build
# Building 1 + extras (NL2SPARQL + Ollama)
docker-compose -f docker-compose.bldg1.yml -f docker-compose.extras.yml up -d --build
# Building 2 only (TimescaleDB)
docker-compose -f docker-compose.bldg2.yml up -d --build
# Building 2 + extras
docker-compose -f docker-compose.bldg2.yml -f docker-compose.extras.yml up -d --build
# Building 3 only (Cassandra)
docker-compose -f docker-compose.bldg3.yml up -d --build
# Building 3 + extras
docker-compose -f docker-compose.bldg3.yml -f docker-compose.extras.yml up -d --buildImportant Notes:
⚠️ Run only one building at a time to avoid port conflicts- ✅ Fuseki included in all compose files (http://localhost:3030)
- ✅ pgAdmin auto-configured for each building
- ✅ Frontend auto-detects active building (no code changes needed)
- 🎮 Ollama uses GPU if available (comment out GPU section if CPU-only)
OntoBot connects building/IoT telemetry and semantic data to a conversational interface and analytics. The platform enables:
- Natural language questions with unit-aware answers
- Time-series analytics via standardized payloads
- Semantic queries (SPARQL on Jena Fuseki) and SQL telemetry
- Artifact generation (plots/CSV) served through HTTP and displayed in the frontend
All services are containerized and connected on the ontobot-network Docker network.
-
User asks a question in the React frontend (e.g., "What is the CO2 level in zone 5.04?")
-
Rasa NLU extracts intents and entities from the natural language
-
Action Server executes custom business logic:
- Translates NL → SPARQL via
nl2sparqlservice (optional) - Queries Jena Fuseki for semantic data (equipment, sensors, relationships)
- Retrieves sensor UUIDs and metadata from knowledge graph
- Fetches time-series data from building-specific database (MySQL/TimescaleDB/Cassandra)
- Translates NL → SPARQL via
-
Decider Service classifies the query and determines appropriate analytics type
-
Analytics Microservices perform analysis:
- 30+ analysis types (statistics, anomaly detection, forecasting, correlation, etc.)
- Generates plots, CSVs, and structured results
- Saves artifacts to shared file server storage
-
Ollama/Mistral (optional) generates natural language summaries of results
-
Frontend displays conversational response with:
- Natural language answer
- Interactive charts and visualizations
- Download links for artifacts
- Metadata and statistics
All services communicate over the internal Docker network (ontobot-network) using service names as hostnames:
Action Server Internal URLs:
- FUSEKI_ENDPOINT=http://fuseki-db:3030/trial/sparql
- ANALYTICS_URL=http://microservices:6000/analytics/run
- DECIDER_URL=http://decider-service:6009/decide
- NL2SPARQL_URL=http://nl2sparql:6005/nl2sparql
- SUMMARIZATION_URL=http://ollama:11434
- DB_HOST=mysqlserver (or timescaledb, cassandra)
Host Access (for testing/debugging):
- Replace internal DNS names with
localhostand mapped ports - Example:
http://localhost:6001/analytics/runinstead ofhttp://microservices:6000/analytics/run
Query: "Show me temperature anomalies in zone 5.04 for the last week"
1. User → Frontend (React :3000)
POST message to Rasa webhook
2. Frontend → Rasa (:5005)
Extract intent: query_timeseries
Extract entities: sensor_type=temperature, zone=5.04, time=last week
3. Rasa → Action Server (:5055)
Trigger custom action: action_question_to_brickbot
4. Action Server → NL2SPARQL (:6005) [optional]
Convert "temperature in zone 5.04" → SPARQL query
5. Action Server → Fuseki (:3030)
Execute SPARQL query
Result: temperature_sensor_UUID_for_zone_5.04
6. Action Server → Decider (:6009)
Input: "temperature anomalies last week"
Output: analysis_type="detect_anomalies"
7. Action Server → Database
SQL query for sensor UUID, date range
Result: [{datetime, value}, ...]
8. Action Server → Analytics (:6001)
POST /analytics/run
Payload: {analysis_type, sensor_data, method: "zscore"}
9. Analytics → File Server (:8080)
Save plot: anomalies_temperature_zone504.png
Save CSV: anomalies_temperature_zone504.csv
10. Analytics → Action Server
Return: {anomalies_detected: 3, artifact_urls, statistics}
11. Action Server → Ollama (:11434) [optional]
Generate summary: "I detected 3 temperature anomalies..."
12. Action Server → Rasa → Frontend
Display conversational message + plot + download links
OntoBot is designed to be modular and extensible:
- Add new buildings: Create new compose file, Rasa project, and TTL dataset
- Add analytics types: Extend
microservices/blueprints/with new analysis functions - Integrate new databases: Add database client in Action Server
- Custom NLU: Modify Rasa training data in
domain.ymlanddata/ - UI customization: Edit React frontend in
rasa-frontend/src/ - Deploy remotely: Point
NL2SPARQL_URL/SUMMARIZATION_URLto external services
Use these patterns from the repo root. All commands are PowerShell-friendly on Windows.
# Start services (build if needed) in detached mode
docker-compose -f <compose-file>.yml up -d --build
# Show running services from that file
docker-compose -f <compose-file>.yml ps
# Tail logs for a specific service
docker-compose -f <compose-file>.yml logs -f --tail 200 <service-name>
# Restart a service (e.g., after config changes)
docker-compose -f <compose-file>.yml restart <service-name>
# Stop (keeps containers)
docker-compose -f <compose-file>.yml stop
# Down (removes containers but keeps named volumes)
docker-compose -f <compose-file>.yml down
# Danger: down and remove volumes too (wipes data in named volumes)
docker-compose -f <compose-file>.yml down -vMultiple files (overlays):
# Building stack + extras overlay
docker-compose -f docker-compose.bldg2.yml -f docker-compose.extras.yml up -d --build
# Bring down the combined stack
docker-compose -f docker-compose.bldg2.yml -f docker-compose.extras.yml downScenario: Building 1 full stack with Rasa at ./rasa-bldg1 and shared services (no ThingsBoard path by default).
Use when: You want to run Q&A for bldg1 with Fuseki, MySQL, Editor, Frontend, Analytics.
# Up
docker-compose -f docker-compose.bldg1.yml up -d --build
# Down
docker-compose -f docker-compose.bldg1.yml downScenario: Building 2 full stack with ThingsBoard + TimescaleDB and Rasa at ./rasa-bldg2.
Use when: You want Q&A for bldg2 and to validate telemetry in Timescale.
# Up
docker-compose -f docker-compose.bldg2.yml up -d --build
# Train (one-off)
docker-compose -f docker-compose.bldg2.yml --profile manual up --build --abort-on-container-exit rasa-train
# Down
docker-compose -f docker-compose.bldg2.yml downScenario: Building 3 full stack with ThingsBoard (entities in Postgres) + Cassandra for telemetry and Rasa at ./rasa-bldg3.
Use when: You want Q&A for bldg3 and to validate telemetry in Cassandra.
# Up
docker-compose -f docker-compose.bldg3.yml up -d --build
# Train (one-off)
docker-compose -f docker-compose.bldg3.yml --profile manual up --build --abort-on-container-exit rasa-train
# Down
docker-compose -f docker-compose.bldg3.yml downScenario: Additional services like graphdb, jupyter, adminer (core language services may also be overlaid if not embedded elsewhere).
Use when: You need NL→SPARQL, local LLM summarization, alternate RDF store, notebooks, or a DB UI.
# Run with a building stack (order matters)
# Up
docker-compose -f docker-compose.bldgX.yml -f docker-compose.extras.yml up -d --build
# Down
docker-compose -f docker-compose.bldgX.yml -f docker-compose.extras.yml downScenario: ThingsBoard with plain Postgres for telemetry (no Timescale or Cassandra).
Use when: You want to experiment with TB on Postgres only.
# Up
docker-compose -f docker-compose.pg.yml up -d --build
# Down
docker-compose -f docker-compose.pg.yml downScenario: ThingsBoard backed by Timescale (Postgres + Timescale extension) for telemetry.
Use when: You want a minimal TB + Timescale setup (independent of bldg2 stack).
# Up
docker-compose -f docker-compose.ts.yml up -d --build
# Down
docker-compose -f docker-compose.ts.yml downScenario: ThingsBoard with Cassandra for telemetry and Postgres for entities.
Use when: You want a minimal TB + Cassandra setup (independent of bldg3 stack).
# Up
docker-compose -f docker-compose.cassandra.yml up -d --build
# Down
docker-compose -f docker-compose.cassandra.yml downScenario: One-off job to train a Rasa project (legacy/default project under rasa-ui).
Use when: You want to train outside the per-building stacks.
# Train
docker-compose -f docker-compose.rasatrain.yml up --build --abort-on-container-exitScenario: Legacy/dev monolithic compose with toggles for different TB options.
Use when: You prefer a single file and manually comment/uncomment TB variants (not recommended—prefer per-building files).
# Up
docker-compose up -d --build
# Down
docker-compose down- ✅ Run one building stack at a time for Q&A to avoid port/data conflicts
- ✅ Named volumes persist data across
up/down; usedown -vonly when you intentionally want to reset state - ✅ For overlays, always list the building compose first, then
docker-compose.extras.yml - ✅ Healthchecks are defined for most services—if a service is unhealthy, review logs with
logs -fand check its health endpoint
Copy .env.example to .env and adjust as needed. Key variables used by the Action Server and other services:
This project uses environment variables in the docker-compose.*.yml files. Follow these steps to safely provide secrets and configuration when running Compose locally:
- Create a local
.envfrom.env.example
- Copy the template and edit it locally (do NOT commit
.env):
Copy-Item -Path .env.example -Destination .env
# Open .env and replace placeholder values with your real credentials- Keep
.envin.gitignore. Never commit production secrets.
- Validate before starting
- Show the normalized config and check for missing variables/warnings:
docker compose --env-file .env -f docker-compose.bldg1.yml config- If you see warnings like "The "PG_THINGSBOARD_DB" variable is not set.", add that variable to
.envand re-run the command.
- Start the chosen compose file
- Building 1 (example):
docker compose --env-file .env -f docker-compose.bldg1.yml up -d --build- With extras overlay:
docker compose --env-file .env -f docker-compose.bldg1.yml -f docker-compose.extras.yml up -d --build- Alternative approaches
- Per-shell variables (temporary for current PowerShell session):
$env:PG_THINGSBOARD_PASSWORD = 'verysecret'
docker compose -f docker-compose.bldg1.yml up -d --build- Explicit env file (if you maintain multiple files):
docker compose --env-file .env.bldg1 -f docker-compose.bldg1.yml up -d --buildNotes
- Compose files commonly reference: API_* (Postgres/MySQL), PG_THINGSBOARD_, MYSQL_, ACTION_DB_, JWT_SECRET, FUSEKI_ADMIN_PASSWORD, PGADMIN_DEFAULT_. If you enable the extras overlay, also set GRAPHDB_PASSWORD and JUPYTER_TOKEN.
- For production, prefer secret managers (GitHub Actions secrets, cloud secret management, Docker/K8s secrets) instead of plaintext
.envfiles.
# File Server (serves artifacts)
BASE_URL=http://http_server:8080 # Internal Docker network
# Or for host access: http://localhost:8080
# Analytics Microservices
ANALYTICS_URL=http://microservices:6000/analytics/run # Internal DNS
# Decider Service (analytics classification)
DECIDER_URL=http://decider-service:6009/decide# MySQL (Building 1)
DB_HOST=mysqlserver
DB_PORT=3306 # Internal port (mapped to 3307 on host)
DB_NAME=ontobot
DB_USER=root
DB_PASSWORD=your_password
# TimescaleDB (Building 2)
# Uses same variables, but DB_HOST=timescaledb
# Cassandra (Building 3)
# Uses Cassandra-specific configuration in Action Server# NL2SPARQL Translation (T5 Model)
NL2SPARQL_URL=http://nl2sparql:6005/nl2sparql
# If not set, Action Server skips translation stage
# LLM Summarization (Ollama/Mistral)
SUMMARIZATION_URL=http://ollama:11434
# If not set, Action Server returns raw analytical metrics
# Media Bundling
BUNDLE_MEDIA=true # Bundle multiple media in bot messages (default: true)You do NOT have to run nl2sparql or ollama locally. You can point these URLs at services running on any reachable host:
# External NL2SPARQL service
NL2SPARQL_URL=http://nl2sparql.mycompany.net:6005/nl2sparql
# External LLM API
SUMMARIZATION_URL=https://llm.example.org/api
# Cloud-hosted services
ANALYTICS_URL=https://analytics-api.mycompany.net/analytics/runAs long as the Action Server can reach those HTTP endpoints and they respect the expected contract, the local compose services may be omitted.
The Action Server, File Server, and Editor share the rasa-ui/shared_data directory:
rasa-ui/
├── shared_data/
│ ├── artifacts/ # Analysis results (plots, CSVs)
│ │ └── <user>/ # Per-user artifact folders
│ ├── models/ # Trained Rasa models (shared with Rasa server)
│ └── training/ # Training data and logs
Artifacts:
- Plots and CSVs are placed under
shared_data/artifacts/<safe_user> - File server exposes them at
http://localhost:8080/artifacts/<user>/<filename> - Add
?download=1query parameter to force download instead of display
Best Practices:
- Replace default DB credentials with least-privilege accounts
- Avoid exposing database ports (3306, 5432, 9042) directly to the internet in production
- Never write secrets to
shared_data(it's served over HTTP) - Use internal service names for service-to-service communication in Docker
- Use
localhostand mapped ports only for host-level testing/debugging
The Analytics Microservices API accepts both flat and nested payload formats for flexibility.
Simple dictionary with sensor names as keys:
{
"Readable_Sensor_Name": [
{ "datetime": "2025-02-10 05:31:59", "reading_value": 22.5 },
{ "datetime": "2025-02-10 05:33:00", "reading_value": 23.0 }
]
}Organized by building zones or equipment:
{
"1": {
"Air_Temperature_Sensor": {
"timeseries_data": [
{ "datetime": "2025-02-10 05:31:59", "reading_value": 22.5 },
{ "datetime": "2025-02-10 05:33:00", "reading_value": 23.0 }
]
}
},
"2": {
"Humidity_Sensor": {
"timeseries_data": [
{ "datetime": "2025-02-10 05:31:59", "reading_value": 45.2 },
{ "datetime": "2025-02-10 05:33:00", "reading_value": 46.1 }
]
}
}
}- Human-readable keys: Use descriptive sensor names (e.g.,
Zone_Air_Temperature_Sensor_5.04) - UUID replacement: Action Server replaces UUIDs with readable names before calling Analytics
- Timestamp normalization: Analytics accepts both
datetimeandtimestampkeys - Flexible key matching: Temperature variations (
temp,temperature,Temperature) are handled robustly
{
"analysis_type": "analyze_temperatures",
"1": {
"Zone_Air_Temperature_Sensor_5.04": {
"timeseries_data": [
{ "datetime": "2025-02-10 05:00:00", "reading_value": 21.5 },
{ "datetime": "2025-02-10 06:00:00", "reading_value": 22.0 },
{ "datetime": "2025-02-10 07:00:00", "reading_value": 22.5 }
]
}
}
}Analytics includes UK indoor environmental quality guidelines:
| Parameter | Unit | Recommended Range | Authority |
|---|---|---|---|
| Temperature | °C | 19–23°C | CIBSE Guide A |
| Relative Humidity | %RH | 40–60% | CIBSE/WHO |
| CO₂ | ppm | <1000 ppm | CIBSE TM40 |
| PM2.5 | µg/m³ | <25 µg/m³ | WHO |
| Formaldehyde (HCHO) | µg/m³ | <100 µg/m³ | WHO |
| Noise | dB(A) | <40 dB(A) | WHO |
For anomaly detection analyses, include the method parameter:
{
"analysis_type": "detect_anomalies",
"method": "zscore", // or "iqr"
"1": {
"CO2_Sensor": {
"timeseries_data": [ /* data */ ]
}
}
}Methods:
zscore: Z-score based (default, standard deviations from mean)iqr: Interquartile range based (more robust to outliers)
Base URL: http://localhost:6001 (host) or http://microservices:6000 (internal Docker network)
Invoke-RestMethod -Method Post -Uri http://localhost:6001/analytics/run `
-ContentType 'application/json' `
-Body (@{
analysis_type = 'analyze_temperatures'
'1' = @{
Air_Temperature_Sensor = @{
timeseries_data = @(
@{ datetime = '2025-02-10 05:31:59'; reading_value = 22.5 },
@{ datetime = '2025-02-10 05:33:00'; reading_value = 23.0 },
@{ datetime = '2025-02-10 05:35:00'; reading_value = 22.8 }
)
}
}
} | ConvertTo-Json -Depth 6)Invoke-RestMethod -Method Post -Uri http://localhost:6001/analytics/run `
-ContentType 'application/json' `
-Body (@{
analysis_type = 'detect_anomalies'
method = 'zscore'
'1' = @{
CO2_Sensor = @{
timeseries_data = @(
@{ datetime = '2025-02-10 05:00:00'; reading_value = 450 },
@{ datetime = '2025-02-10 06:00:00'; reading_value = 480 },
@{ datetime = '2025-02-10 07:00:00'; reading_value = 1500 }, # Anomaly
@{ datetime = '2025-02-10 08:00:00'; reading_value = 470 }
)
}
}
} | ConvertTo-Json -Depth 6)Invoke-RestMethod -Method Post -Uri http://localhost:6001/analytics/run `
-ContentType 'application/json' `
-Body (@{
analysis_type = 'forecast_timeseries'
periods = 24 # Forecast 24 hours ahead
'1' = @{
Temperature_Sensor = @{
timeseries_data = @(
# ... 7 days of historical data ...
)
}
}
} | ConvertTo-Json -Depth 6)OntoBot supports 30+ analytics types organized by category:
analyze_temperatures- Statistical analysis with comfort range validationanalyze_humidity- RH analysis with mold/dryness risk detectionanalyze_co2- IAQ analysis with ventilation recommendationsanalyze_pm- Particulate matter with health risk assessmentanalyze_hcho- Formaldehyde analysis with safety thresholdsanalyze_noise- Acoustic comfort analysiscalculate_aqi- Air Quality Index calculation
analyze_delta_t- Supply/return temperature differentialanalyze_airflow- Flow rate analysis and efficiencyanalyze_pressure_trend- Pressure monitoringdetect_hvac_anomalies- Equipment fault detectioncalculate_cop- Coefficient of Performance
correlation_analysis- Multi-sensor correlation matrixaggregation_analysis- Temporal aggregation (hourly, daily, weekly)trend_analysis- Linear regression and trend detectiondetect_anomalies- Statistical outlier detection (z-score/IQR)forecast_timeseries- ARIMA-based forecastingdetect_potential_failures- Predictive maintenance
predict_downtime- Failure prediction with confidence intervalsdevice_deviation_analysis- Cross-device comparisonpredict_failure_trends- Long-term failure probabilitycalculate_recalibration_frequency- Sensor drift analysisanalyze_setpoint_deviation- Control loop performance
energy_optimization- Load profiling and savings opportunitiesoccupancy_correlation- Occupancy vs environmental parametersseasonal_decomposition- Trend, seasonality, residual componentschange_point_detection- Regime shifts in time-seriessensor_health_monitoring- Data quality and sensor statuscomfort_compliance- Regulatory compliance checking- And more...
Quick validation of all analytics endpoints:
# From repo root
python microservices/test_analytics_smoke.pyExpected output:
Testing endpoint: /health
✅ Health check passed
Testing endpoint: /analytics/run (analyze_temperatures)
✅ Temperature analysis passed
Testing endpoint: /analytics/run (detect_anomalies)
✅ Anomaly detection passed
All tests passed! ✅
# PowerShell
Invoke-RestMethod http://localhost:6001/health
# Output
@{status="healthy"; service="analytics-microservices"; version="2.0"}OntoBot uses custom Rasa actions to orchestrate the entire analytics pipeline. For an in-depth description of the Action Server implementation, see the dedicated actions README at rasa-bldg1/actions/README.md.
Purpose: Extract and validate sensor types from user queries
Features:
- ✅ Comma-separated inputs:
"temperature, humidity, CO2" - ✅ Fuzzy matching:
"temp"→"Zone_Air_Temperature_Sensor" - ✅ Canonical name mapping: Standardizes sensor names from
sensor_list.txt
Slot: sensor_type (List[str])
Example:
User: "Show me temp and humidity"
→ sensor_type: ["Zone_Air_Temperature_Sensor", "Zone_Air_Humidity_Sensor"]
Purpose: Extract start and end dates for time-series queries
Supported Formats:
DD/MM/YYYY:"10/02/2025"→ February 10, 2025YYYY-MM-DD:"2025-02-10"→ February 10, 2025- Natural language:
"last week","today","now","yesterday","last month"
Slots: start_date, end_date
Example:
User: "Show trends from 01/02/2025 to 07/02/2025"
→ start_date: "2025-02-01", end_date: "2025-02-07"
User: "Analyze last week"
→ start_date: "2025-02-03", end_date: "2025-02-10"
| Slot | Type | Description | Example |
|---|---|---|---|
sensor_type |
List[str] | Sensor names (fuzzy-matched) | ["Temperature_Sensor"] |
start_date |
str | Query start date (ISO format) | "2025-02-01" |
end_date |
str | Query end date (ISO format) | "2025-02-10" |
timeseries_ids |
List[str] | Sensor UUIDs from SPARQL | ["uuid1", "uuid2"] |
analytics_type |
str | Analysis type from Decider | "detect_anomalies" |
zone |
str | Building zone identifier | "5.04" |
building |
str | Building identifier | "Building_1" |
Purpose: Primary action for natural language queries
Flow:
-
NL → SPARQL Translation (optional via
NL2SPARQL_URL)- Converts natural language to SPARQL query
- Example:
"temperature in zone 5.04"→ SPARQL query
-
SPARQL Query Execution
- Queries Jena Fuseki for sensor UUIDs and metadata
- Returns: Sensor names, UUIDs, zones, equipment relationships
-
Save JSON Artifact
- Saves SPARQL results to
shared_data/artifacts/<user>/sparql_results.json
- Saves SPARQL results to
-
Analytics Decision (via Decider Service)
- Determines appropriate analysis type based on query
- Example:
"Show anomalies"→analytics_type="detect_anomalies"
-
Set Slots and Trigger Followup
- Sets
timeseries_ids,analytics_type,sensor_typeslots - Triggers
action_process_timeseriesas followup action
- Sets
Example:
User: "Detect temperature anomalies in zone 5.04 last week"
→ NL2SPARQL → SPARQL query → Sensor UUIDs
→ Decider → analytics_type="detect_anomalies"
→ FollowupAction("action_process_timeseries")
Purpose: Fetch data, run analytics, generate artifacts
Flow:
-
SQL Data Fetch
- Query database (MySQL/TimescaleDB/Cassandra) using sensor UUIDs
- Date range from
start_date/end_dateslots - Returns:
[{datetime, reading_value}, ...]
-
Build Analytics Payload
- Format: Flat or nested payload (see Data and Payloads)
- Replace UUIDs with human-readable names
- Normalize timestamps
-
Call Analytics Microservices
- POST to
http://microservices:6000/analytics/run - Payload includes
analysis_typeand sensor data
- POST to
-
Process Results
- Extract statistics, anomalies, forecasts
- Save plots (PNG) and data (CSV) to
shared_data/artifacts/<user>/ - Generate artifact URLs:
http://localhost:8080/artifacts/<user>/<filename>
-
LLM Summarization (optional via
SUMMARIZATION_URL)- Generate natural language summary of results
- Example: "I detected 3 temperature anomalies in zone 5.04..."
-
Return Conversational Response
- Natural language answer
- Artifact links (plots, CSVs)
- Statistics and metadata
Example:
Input Slots:
sensor_type: ["Zone_Air_Temperature_Sensor_5.04"]
timeseries_ids: ["uuid-abc-123"]
analytics_type: "detect_anomalies"
start_date: "2025-02-03"
end_date: "2025-02-10"
Output:
Response: "I detected 3 temperature anomalies in zone 5.04 last week..."
Artifacts:
- temperature_anomalies_zone504.png
- temperature_anomalies_zone504.csv
Statistics: {anomalies_count: 3, mean: 22.3, std: 1.2}
Action Server uses Docker internal DNS for service-to-service communication:
# Action Server Environment Variables
FUSEKI_ENDPOINT = "http://fuseki-db:3030/trial/sparql"
ANALYTICS_URL = "http://microservices:6000/analytics/run"
DECIDER_URL = "http://decider-service:6009/decide"
NL2SPARQL_URL = "http://nl2sparql:6005/nl2sparql" # Optional
SUMMARIZATION_URL = "http://ollama:11434" # Optional
DB_HOST = "mysqlserver" # or "timescaledb" or "cassandra"
FILE_SERVER_URL = "http://http_server:8080"Test the bot directly via REST webhook:
Invoke-RestMethod -Method Post -Uri http://localhost:5005/webhooks/rest/webhook `
-ContentType 'application/json' `
-Body (@{
sender = 'tester1'
message = 'Give me humidity trends for Zone_Air_Humidity_Sensor_5.01 last week'
} | ConvertTo-Json)Response:
[
{
"recipient_id": "tester1",
"text": "I analyzed humidity trends for Zone 5.01 from Feb 3-10. The average was 45.2% RH (within comfort range). Here are the results:",
"image": "http://localhost:8080/artifacts/tester1/humidity_trends_zone501.png"
},
{
"recipient_id": "tester1",
"custom": {
"type": "download",
"url": "http://localhost:8080/artifacts/tester1/humidity_trends_zone501.csv",
"filename": "humidity_trends_zone501.csv"
}
}
]The Action Server includes configurable verbosity levels:
- High Verbosity: Full statistics, metadata, and explanations
- Low Verbosity: Concise summaries with artifact links
- Artifact Bundling: Controlled via
BUNDLE_MEDIA=true|falsein.env
See building-specific actions/README.md for detailed implementation:
OntoBot/
├── rasa-bldg1/ # Building 1 Rasa project
│ ├── actions/
│ │ ├── actions.py # Custom action logic
│ │ ├── sensor_mappings.txt # name,uuid mappings
│ │ └── sensor_list.txt # Curated sensor names for fuzzy matching
│ ├── data/ # NLU training data (intents, stories, rules)
│ ├── domain.yml # Domain configuration
│ ├── config.yml # Pipeline and policies
│ ├── endpoints.yml # Action server and tracker store endpoints
│ ├── credentials.yml # Channel credentials
│ ├── models/ # Trained Rasa models
│ └── shared_data/ # Shared artifacts and datasets
│ └── artifacts/ # Analysis results (plots, CSVs)
│ └── <user>/ # Per-user artifact folders
│
├── rasa-bldg2/ # Building 2 Rasa project (same structure)
├── rasa-bldg3/ # Building 3 Rasa project (same structure)
│
├── microservices/ # Analytics Microservices
│ ├── app.py # Flask application entry point
│ ├── blueprints/ # Analysis modules
│ │ ├── temperature.py
│ │ ├── humidity.py
│ │ ├── anomaly.py
│ │ ├── forecasting.py
│ │ └── ... (30+ modules)
│ └── test_analytics_smoke.py # Smoke tests
│
├── decider-service/ # Analytics Decision Service
│ ├── app.py # Flask application
│ ├── training/ # Training data and models
│ │ ├── training_data.csv
│ │ └── best_model.pkl
│ └── README.md
│
├── Transformers/ # NL2SPARQL & LLM Services
│ ├── nl2sparql/ # T5-based translation
│ │ ├── app.py
│ │ └── checkpoint-3/ # Fine-tuned T5 model
│ ├── ollama/ # Mistral LLM
│ └── README.md
│
├── rasa-frontend/ # React Frontend
│ ├── src/
│ │ ├── App.js # Main application
│ │ ├── components/ # UI components
│ │ └── ChatInterface.js
│ ├── package.json
│ └── README.md
│
├── bldg1/ # Building 1 TTL datasets
│ └── trial/dataset/ # RDF/TTL files for Fuseki
│
├── bldg2/ # Building 2 TTL datasets
│ ├── servers.json # pgAdmin pre-configuration
│ └── trial/dataset/
│
├── bldg3/ # Building 3 TTL datasets
│ ├── servers.json # pgAdmin pre-configuration (optional)
│ └── trial/dataset/
│
├── scripts/ # Utility scripts
│ └── tidy_artifacts.ps1 # Move stray files to artifacts folder
│
├── docker-compose.bldg1.yml # Building 1 stack
├── docker-compose.bldg2.yml # Building 2 stack
├── docker-compose.bldg3.yml # Building 3 stack
├── docker-compose.extras.yml # Optional services overlay
│
├── .env.example # Environment template
├── .env # User configuration (git-ignored)
├── README.md # This file
├── BUILDINGS.md # Building taxonomy
├── PORTS.md # Port reference
├── MULTI_BUILDING_SUPPORT.md # Multi-building guide
└── LICENSE # MIT License
rasa-bldg1/actions/sensor_mappings.txt:
Zone_Air_Temperature_Sensor_5.01,uuid-abc-123
Zone_Air_Humidity_Sensor_5.01,uuid-def-456
Zone_Air_CO2_Sensor_5.01,uuid-ghi-789
Format: <readable_name>,<uuid> (one per line)
rasa-bldg1/actions/sensor_list.txt:
Zone_Air_Temperature_Sensor_5.01
Zone_Air_Humidity_Sensor_5.01
Zone_Air_CO2_Sensor_5.01
Purpose: Curated list for improved fuzzy matching
rasa-bldg1/domain.yml: Defines intents, entities, slots, responses, actions
rasa-bldg1/data/: NLU examples, stories, rules for training
OntoBot uses Apache Jena Fuseki as the SPARQL triple store for semantic knowledge graph queries.
- Service Name:
jena-fuseki-rdf-store(orjena-fuseki) - Host Port:
3030 - Internal Port:
3030 - Data Volume:
./bldgX/trial/dataset:/fuseki-data(building-specific) - Network:
ontobot-network
- Open http://localhost:3030 in your browser
- Default credentials (if authentication is enabled):
- Username:
admin - Password: (check compose file or environment variables)
- Username:
- Navigate to http://localhost:3030
- Click "manage datasets"
- Click "add new dataset"
- Dataset name:
trial(or your preferred name) - Dataset type: Persistent (TDB2)
- Dataset name:
- Click "create dataset"
- Select your dataset from the list
- Click "upload data"
- Choose your TTL files from:
- Building 1:
bldg1/trial/dataset/*.ttl - Building 2:
bldg2/trial/dataset/*.ttl - Building 3:
bldg3/trial/dataset/*.ttl
- Building 1:
- Click "upload"
The compose files mount the TTL directory directly:
volumes:
- ./bldg1/trial/dataset:/fuseki-dataFuseki automatically loads datasets from /fuseki-data on startup (if configured).
| Purpose | URL | Method |
|---|---|---|
| Query | http://localhost:3030/trial/sparql |
GET/POST |
| Update | http://localhost:3030/trial/update |
POST |
| Data | http://localhost:3030/trial/data |
GET/POST/PUT |
| Upload | http://localhost:3030/trial/upload |
POST |
Internal Docker Network (from Action Server):
http://fuseki-db:3030/trial/sparql
PREFIX brick: <https://brickschema.org/schema/Brick#>
PREFIX ref: <https://brickschema.org/schema/Brick/ref#>
SELECT ?sensor ?hasUUID WHERE {
?sensor a brick:Zone_Air_Temperature_Sensor ;
brick:isPointOf ?zone ;
ref:hasExternalReference ?uuid_node .
?zone brick:hasIdentifier "5.04" .
?uuid_node ref:hasTimeseriesId ?hasUUID .
}PREFIX brick: <https://brickschema.org/schema/Brick#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?sensorType WHERE {
?sensor rdf:type ?sensorType .
FILTER(STRSTARTS(STR(?sensorType), "https://brickschema.org/schema/Brick#"))
}
ORDER BY ?sensorTypePREFIX brick: <https://brickschema.org/schema/Brick#>
SELECT ?equipment ?point WHERE {
?point brick:isPointOf ?equipment .
?equipment a brick:Equipment .
}
LIMIT 50The Action Server automatically prepends common prefixes to SPARQL queries:
DEFAULT_PREFIXES = """
PREFIX brick: <https://brickschema.org/schema/Brick#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ref: <https://brickschema.org/schema/Brick/ref#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
"""Example Usage in actions.py:
import requests
fuseki_endpoint = os.getenv("FUSEKI_ENDPOINT", "http://fuseki-db:3030/trial/sparql")
sparql_query = """
SELECT ?sensor ?hasUUID WHERE {
?sensor a brick:Zone_Air_Temperature_Sensor ;
ref:hasExternalReference ?uuid_node .
?uuid_node ref:hasTimeseriesId ?hasUUID .
}
"""
response = requests.post(
fuseki_endpoint,
data={"query": sparql_query},
headers={"Accept": "application/sparql-results+json"}
)
results = response.json()# PowerShell
Invoke-RestMethod http://localhost:3030/$/ping
# Expected output:
# (Blank 200 OK response)Each building has its own TTL dataset:
bldg1/trial/dataset/
├── abacws-building.ttl # Building metadata
├── sensors-zone-5.01.ttl # Zone 5.01 sensors
├── sensors-zone-5.02.ttl # Zone 5.02 sensors
├── ...
└── sensor-uuids.ttl # Sensor UUID mappings
Brick Schema Version: 1.3 or 1.4 (depending on building)
- Apache Jena Fuseki: https://jena.apache.org/documentation/fuseki2/
- SPARQL over HTTP (SOH): https://jena.apache.org/documentation/fuseki2/soh.html
- SPARQL 1.1 Query Language: https://www.w3.org/TR/sparql11-query/
- Brick Schema: https://brickschema.org/
OntoBot supports three database backends for building-specific telemetry storage.
Container: mysqlserver
Host Port: 3307 (internal: 3306)
Database: ontobot
Main Table: sensor_readings
CREATE TABLE sensor_readings (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
Datetime DATETIME NOT NULL,
uuid_sensor_1 DECIMAL(10,2),
uuid_sensor_2 DECIMAL(10,2),
uuid_sensor_3 DECIMAL(10,2),
-- ... (one column per sensor UUID)
INDEX idx_datetime (Datetime)
) ENGINE=InnoDB;Key Design Decisions:
- ✅
Datetimeas clustered index - Optimizes time-range queries - ✅ UUID columns - One column per sensor (wide table design)
- ✅ DECIMAL(10,2) - Precise numeric storage for sensor readings
⚠️ NULL values allowed - Sensors may not report at same timestamp
SELECT Datetime, uuid_abc_123, uuid_def_456
FROM sensor_readings
WHERE Datetime BETWEEN '2025-02-01 00:00:00' AND '2025-02-10 23:59:59'
ORDER BY Datetime ASC;Important: Action Server does NOT add AND uuid_abc_123 IS NOT NULL to prevent data loss. NULLs are filtered per-column in Python after retrieval.
Indexes:
-- Primary index (already exists)
CREATE INDEX idx_datetime ON sensor_readings(Datetime);
-- Per-UUID indexes for heavy use
CREATE INDEX idx_sensor_1 ON sensor_readings(uuid_sensor_1);
CREATE INDEX idx_sensor_2 ON sensor_readings(uuid_sensor_2);Partitioning (for large datasets):
ALTER TABLE sensor_readings
PARTITION BY RANGE (YEAR(Datetime)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p_future VALUES LESS THAN MAXVALUE
);Table Optimization:
OPTIMIZE TABLE sensor_readings;Connection Details:
- Internal Docker:
mysqlserver:3306 - Host access:
localhost:3307 - Username:
root(or configured user) - Password: From
.env(DB_PASSWORD)
Container: timescaledb
Host Port: 5433 (internal: 5432)
Database: thingsboard
TimescaleDB extends PostgreSQL with time-series optimizations:
-- Create hypertable
SELECT create_hypertable('ts_kv', 'ts', chunk_time_interval => 86400000);
-- Enable compression
ALTER TABLE ts_kv SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'entity_id,key'
);
-- Create compression policy (compress data older than 7 days)
SELECT add_compression_policy('ts_kv', INTERVAL '7 days');
-- Create retention policy (drop data older than 1 year)
SELECT add_retention_policy('ts_kv', INTERVAL '1 year');-- Recent sensor data
SELECT
to_timestamp(ts/1000.0) AS timestamp,
dbl_v AS value
FROM ts_kv
WHERE entity_id = 'uuid-abc-123'
AND ts >= extract(epoch from now() - INTERVAL '7 days') * 1000
ORDER BY ts DESC;
-- Time-bucket aggregation
SELECT
time_bucket('1 hour', to_timestamp(ts/1000.0)) AS hour,
AVG(dbl_v) AS avg_value,
MAX(dbl_v) AS max_value,
MIN(dbl_v) AS min_value
FROM ts_kv
WHERE entity_id = 'uuid-abc-123'
GROUP BY hour
ORDER BY hour DESC;Connection Details:
- Internal Docker:
timescaledb:5432 - Host access:
localhost:5433 - Username:
thingsboard - Password:
thingsboard
pgAdmin Pre-configured: See bldg2/servers.json
Container: cassandra
Host Port: 9042
Keyspace: thingsboard
CREATE KEYSPACE IF NOT EXISTS thingsboard
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 1
};
CREATE TABLE thingsboard.ts_kv_cf (
entity_type text,
entity_id uuid,
key text,
partition bigint,
ts bigint,
bool_v boolean,
str_v text,
long_v bigint,
dbl_v double,
PRIMARY KEY ((entity_type, entity_id, key, partition), ts)
) WITH CLUSTERING ORDER BY (ts DESC);-- Recent sensor data
SELECT ts, dbl_v
FROM thingsboard.ts_kv_cf
WHERE entity_type = 'DEVICE'
AND entity_id = uuid-abc-123
AND key = 'temperature'
AND partition = 0
AND ts >= 1704067200000
ORDER BY ts DESC
LIMIT 1000;Connection Details:
- Internal Docker:
cassandra:9042 - Host access:
localhost:9042 - Username: (default, no auth)
- CQL Shell:
docker exec -it cassandra cqlsh
Challenge: Sensors may not report readings at the exact same timestamp.
Solution:
- Retrieve all requested UUID columns without
IS NOT NULLfilter - Per-column filtering in Python/pandas after retrieval
- Align timestamps using forward-fill or interpolation if needed
Example (Python):
import pandas as pd
# Query returns: Datetime, uuid1, uuid2 (with NULLs)
df = pd.read_sql(query, connection)
# Filter per-column
df_uuid1 = df[['Datetime', 'uuid1']].dropna()
df_uuid2 = df[['Datetime', 'uuid2']].dropna()
# Send separate payloads to analytics if needed- ✅ Use
Datetimeindex for time-range queries - ✅ Add per-UUID indexes for frequently queried sensors
- ✅ Use partitioning for datasets > 100M rows
- ✅ Run
OPTIMIZE TABLEperiodically ⚠️ AvoidSELECT *- specify UUID columns explicitly
- ✅ Enable compression for historical data
- ✅ Use
time_bucket()for aggregations - ✅ Create continuous aggregates for common queries
- ✅ Set retention policies to auto-drop old data
- ✅ Monitor chunk size and adjust chunk interval if needed
- ✅ Design partition keys to avoid hot spots
- ✅ Use appropriate compaction strategy (SizeTieredCompactionStrategy)
- ✅ Limit query results (avoid full table scans)
- ✅ Monitor repair operations for multi-node clusters
⚠️ Cassandra is optimized for writes, not complex queries
For large-scale deployments (>1B rows), consider:
- InfluxDB: Purpose-built time-series database
- TimescaleDB: Already used in Building 2, can replace MySQL for Building 1
- ClickHouse: Column-oriented for analytical queries
- Prometheus + Victoria Metrics: Metrics-focused with long-term storage
OntoBot generates analysis results (plots, CSVs, JSON) as artifacts served via HTTP.
Location: rasa-ui/shared_data/artifacts/<safe_user>/
Safe User: Derived from sender_id with special characters sanitized:
- Input:
user@example.com - Safe User:
user_example_com
Directory Structure:
rasa-ui/shared_data/artifacts/
├── tester1/
│ ├── temperature_trends_2025-02-10.png
│ ├── temperature_trends_2025-02-10.csv
│ ├── anomalies_detected.json
│ └── sparql_results.json
├── user2/
│ └── ...
└── admin/
└── ...
Container: http_server
Host Port: 8080
Internal Port: 8080
Mount: ./rasa-ui/shared_data:/app/shared_data
Health Check: http://localhost:8080/health
http://localhost:8080/artifacts/<user>/<filename>
Example:
http://localhost:8080/artifacts/tester1/temperature_trends.png
Add ?download=1 query parameter:
http://localhost:8080/artifacts/tester1/temperature_trends.csv?download=1
# PowerShell
Invoke-RestMethod http://localhost:8080/artifacts/tester1/| Type | Extension | Generated By | Purpose |
|---|---|---|---|
| Plots | .png |
Analytics Microservices | Visualizations (line, scatter, heatmap) |
| Data | .csv |
Analytics Microservices | Raw/processed data for download |
| Metadata | .json |
Action Server, Analytics | Structured results and statistics |
| SPARQL Results | .json |
Action Server | Knowledge graph query results |
-
Generation: Analytics creates plot and CSV
- Saved to
shared_data/artifacts/<user>/ - Filenames include timestamp for uniqueness
- Saved to
-
URL Construction: Action Server builds URLs
- Format:
http://http_server:8080/artifacts/<user>/<filename> - Internal:
http://http_server:8080(Docker network) - External:
http://localhost:8080(host access)
- Format:
-
Delivery: Rasa returns artifact URLs in bot response
- Image URLs: Displayed inline in frontend
- CSV URLs: Shown as download links
- JSON URLs: Used programmatically
-
Cleanup: Manual or scripted
- Artifacts persist indefinitely
- Use
tidy_artifacts.ps1to organize files
Script: scripts/tidy_artifacts.ps1
Purpose: Move stray files from shared_data root into artifacts/ folder
# Run from repo root
.\scripts\tidy_artifacts.ps1What it does:
- Scans
rasa-ui/shared_data/for files not inartifacts/ - Moves them to
artifacts/misc/orartifacts/<detected_user>/ - Preserves directory structure
- Logs all operations
When to use:
- After manual testing
- Before production deployment
- Periodically for cleanup
The React frontend automatically:
- ✅ Displays PNG/JPG artifacts as inline images
- ✅ Shows CSV/JSON artifacts as download buttons
- ✅ Handles multiple artifacts per message
- ✅ Supports artifact bundling (controlled by
BUNDLE_MEDIA)
Example Bot Response:
{
"text": "Here are the temperature trends for last week:",
"image": "http://localhost:8080/artifacts/tester1/temp_trends.png",
"custom": {
"type": "download",
"url": "http://localhost:8080/artifacts/tester1/temp_trends.csv",
"filename": "temperature_trends.csv"
}
}- Artifacts are served over HTTP without authentication
- Do NOT store sensitive data in
shared_data/ - In production, add authentication to file server
- Consider time-limited signed URLs for sensitive artifacts
- Implement artifact expiration/cleanup policies
Best Practice:
<analysis_type>_<sensor>_<zone>_<timestamp>.<ext>
Examples:
- temperature_trends_zone504_2025-02-10_143022.png
- anomalies_detected_co2_zone501_2025-02-10.csv
- humidity_forecast_building1_2025-02-10.json
Benefits:
- Easy identification
- Prevents filename collisions
- Sortable by timestamp
- Self-documenting
