Add TopoSense-Bench: A Semantic-Spatial Sensor Scheduling Benchmark #38

houqiii · 2025-12-11T04:30:10Z

Introduction

This PR integrates TopoSense-Bench, a rigorous benchmark designed to evaluate Large Language Models (LLMs) on the Semantic-Spatial Sensor Scheduling (S³) problem.

It originates from the ACM MobiCom '26 paper: "IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling".

Unlike standard QA tasks, this benchmark requires the LLM to act as an agent that translates high-level user intents (e.g., "Find my backpack lost between the library and the gym") into precise physical sensor node IDs within a large-scale digital twin.

Key Features

Hugging Face Integration:
- Unlike other benchmarks that store data locally, this implementation utilizes the datasets library to load data directly from Hugging Face.
- Benefit: Keeps the repository lightweight and ensures users always access the latest version of the dataset.
RAG-based Evaluation Logic:
- Implements a TopologyManager that simulates a retrieval system. It dynamically fetches relevant building/floor topological data based on the user query, testing the model's ability to reason over long contexts and spatial constraints.
Specialized Evaluator:
- Includes a custom TopoSenseEvaluator to robustly parse and match sensor Node IDs (e.g., teaching_building_1_camera_03) against ground truth.

📊 Dataset Statistics

Scale: 5,250 natural language queries.
Environment: A university digital twin with 33 buildings, 161 floor plans, and 2,510 sensors.
Tasks:
1. Intra-Zone Perception
2. Intra-Building Coordination
3. Inter-Building Coordination

Implementation Details

Directory Structure: Follows the repository standard (benchmarks/toposense_bench/).
SDK Usage: Reuses sdk.executor.SimpleExecutor for LLM calls and sdk.utils for configuration management.
Configuration: Sensitive keys are managed via env.toml (template provided).

How to Run

Navigate to the benchmark directory:
```
cd benchmarks/toposense_bench
```
Install dependencies:
```
pip install -r requirements.txt
```
Configure env.toml with your API Key (e.g., OPENAI_API_KEY).

Run the evaluation script:

# Example using GPT-4o
bash run.sh "gpt-4o"

# Example using DeepSeek (via OpenAI-compatible endpoint)
bash run.sh "openai/deepseek-chat"

xuafeng · 2025-12-11T22:34:25Z

benchmarks/toposense_bench/run.sh

@@ -0,0 +1,23 @@
+#!/bin/bash


Can you follow template to add one install.sh that is needed for our integration? Thanks.

xuafeng · 2025-12-11T22:39:29Z

benchmarks/toposense_bench/src/main.py

+        os.makedirs(output_dir)
+
+    df = pd.DataFrame(results)
+    df.to_json(


Can you refer to https://github.com/sys-intelligence/system-intelligence-benchmark/tree/main/benchmarks/course_exam_bench#output-files to have different level of result files? summary.json is needed.

xuafeng

Thanks for the contributions. I left some comments and also looped our team members for feedback.

xuafeng · 2025-12-11T22:56:13Z

benchmarks/toposense_bench/README.md

+
+## 📊 Overview
+
+- **Source**: Hosted on [Hugging Face](https://huggingface.co/datasets/IoT-Brain-Project/TopoSense-Bench) (Seamlessly integrated via the `datasets` library).


It said "404". Is it because that the datasets are not public yet?

xuafeng

Add more comments.

Qian-Cheng-nju · 2025-12-12T10:02:56Z

benchmarks/toposense_bench/src/main.py

+{
+    "answer": "sensor_name_here",
+    "explanation": "Brief reasoning based on map tags"
+}


missing the closing ``` fence

Qian-Cheng-nju · 2025-12-12T10:08:27Z

benchmarks/toposense_bench/src/topology_loader.py

+        try:
+            # Load the 'topology' configuration.
+            # Hugging Face defaults uploaded JSONL files to the 'train' split.
+            ds = load_dataset("IoT-Brain/TopoSense-Bench", "topology", split="train")


"IoT-Brain/TopoSense-Bench"

Not aligned to the README's Hugging Face link

tareknaser

Thank you for the contribution! I left some comments

tareknaser · 2025-12-12T22:26:16Z

benchmarks/toposense_bench/README.md

Could you add an entry for the benchmark to the root project README?

tareknaser · 2025-12-12T22:27:16Z

benchmarks/toposense_bench/tests/test_benchmark.py

This test isn’t running in CI right now. Please add it to .github/workflows/test.yml

tareknaser · 2025-12-12T22:29:24Z

benchmarks/toposense_bench/README.md

There’s also ongoing work in another PR to add a Why.md file to each benchmark directory. See the discussion: #21 (comment)

houqiii · 2025-12-14T05:45:05Z

Thank you @xuafeng, @Qian-Cheng-nju, and @tareknaser for the constructive feedback!

I have pushed the latest changes which address all the points raised in the review. Here is a summary of the updates:

1. Output Format & Engineering (@xuafeng)

Standardized Output: Updated src/main.py to align with the repository standard. It now generates three files:
- summary.json (Aggregated statistics with overall and category-level accuracy).
- results.jsonl (Minimal results).
- results_detailed.jsonl (Full debugging info including prompts and retrieval status).
Installation: Added install.sh to automate environment setup.
Link Fix: Corrected the Hugging Face URL in README.md to point to the correct organization (IoT-Brain).

2. Documentation & CI (@tareknaser)

Why TopoSense: Added benchmarks/toposense_bench/Why.md to explain the benchmark's significance.
Root README: Added TopoSense-Bench entry to the main project README.
CI Integration: Added toposense_bench to the test matrix in .github/workflows/test.yml.

3. Code Fixes (@Qian-Cheng-nju)

Syntax: Fixed the missing closing brace } in the JSON prompt template.
Topology Loader: Updated src/topology_loader.py to use the correct HF dataset path (IoT-Brain/TopoSense-Bench), ensuring consistency with the README.

I verified the changes locally with a smoke test script, confirming that the data loading, RAG retrieval, and file generation logic work as expected.

Ready for the next round of review!

Qian-Cheng-nju · 2025-12-14T07:26:59Z

The new version looks great to me — thank you very much!

feat: Add TopoSense-Bench (HuggingFace integration)

b2fa8ac

xuafeng requested review from Jackcuii and Qian-Cheng-nju December 11, 2025 20:27

xuafeng reviewed Dec 11, 2025

View reviewed changes

xuafeng requested a review from tareknaser December 11, 2025 22:54

xuafeng reviewed Dec 11, 2025

View reviewed changes

Qian-Cheng-nju reviewed Dec 12, 2025

View reviewed changes

tareknaser reviewed Dec 12, 2025

View reviewed changes

houqiii added 2 commits December 14, 2025 13:32

fix: Finalize main.py with robust error handling and output format

7321ffc

fix: Finalize codebase (docs, CI, install script, and output format)

c1dd37f

xuafeng merged commit 4b3c323 into sys-intelligence:main Dec 22, 2025
3 checks passed


		## 📊 Overview

		- Source: Hosted on [Hugging Face](https://huggingface.co/datasets/IoT-Brain-Project/TopoSense-Bench) (Seamlessly integrated via the `datasets` library).

Add TopoSense-Bench: A Semantic-Spatial Sensor Scheduling Benchmark #38

Add TopoSense-Bench: A Semantic-Spatial Sensor Scheduling Benchmark #38

Uh oh!

Conversation

houqiii commented Dec 11, 2025

Introduction

Key Features

📊 Dataset Statistics

Implementation Details

How to Run

Uh oh!

xuafeng Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

xuafeng Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuafeng left a comment

Choose a reason for hiding this comment

Uh oh!

xuafeng Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

xuafeng left a comment

Choose a reason for hiding this comment

Uh oh!

Qian-Cheng-nju Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Qian-Cheng-nju Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

tareknaser left a comment

Choose a reason for hiding this comment

Uh oh!

tareknaser Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

tareknaser Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

tareknaser Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

houqiii commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qian-Cheng-nju commented Dec 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xuafeng Dec 11, 2025 •

edited

Loading

houqiii commented Dec 14, 2025 •

edited

Loading