-
Notifications
You must be signed in to change notification settings - Fork 5
Add TopoSense-Bench: A Semantic-Spatial Sensor Scheduling Benchmark #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @@ -0,0 +1,23 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you follow template to add one install.sh that is needed for our integration? Thanks.
| os.makedirs(output_dir) | ||
|
|
||
| df = pd.DataFrame(results) | ||
| df.to_json( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you refer to https://github.com/sys-intelligence/system-intelligence-benchmark/tree/main/benchmarks/course_exam_bench#output-files to have different level of result files? summary.json is needed.
xuafeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contributions. I left some comments and also looped our team members for feedback.
benchmarks/toposense_bench/README.md
Outdated
|
|
||
| ## 📊 Overview | ||
|
|
||
| - **Source**: Hosted on [Hugging Face](https://huggingface.co/datasets/IoT-Brain-Project/TopoSense-Bench) (Seamlessly integrated via the `datasets` library). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It said "404". Is it because that the datasets are not public yet?
xuafeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add more comments.
| { | ||
| "answer": "sensor_name_here", | ||
| "explanation": "Brief reasoning based on map tags" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing the closing ``` fence
| try: | ||
| # Load the 'topology' configuration. | ||
| # Hugging Face defaults uploaded JSONL files to the 'train' split. | ||
| ds = load_dataset("IoT-Brain/TopoSense-Bench", "topology", split="train") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"IoT-Brain/TopoSense-Bench"
Not aligned to the README's Hugging Face link
tareknaser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution! I left some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an entry for the benchmark to the root project README?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test isn’t running in CI right now. Please add it to .github/workflows/test.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s also ongoing work in another PR to add a Why.md file to each benchmark directory. See the discussion: #21 (comment)
|
Thank you @xuafeng, @Qian-Cheng-nju, and @tareknaser for the constructive feedback! I have pushed the latest changes which address all the points raised in the review. Here is a summary of the updates: 1. Output Format & Engineering (@xuafeng)
2. Documentation & CI (@tareknaser)
3. Code Fixes (@Qian-Cheng-nju)
I verified the changes locally with a smoke test script, confirming that the data loading, RAG retrieval, and file generation logic work as expected. Ready for the next round of review! |
|
The new version looks great to me — thank you very much! |
Introduction
This PR integrates TopoSense-Bench, a rigorous benchmark designed to evaluate Large Language Models (LLMs) on the Semantic-Spatial Sensor Scheduling (S³) problem.
It originates from the ACM MobiCom '26 paper: "IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling".
Unlike standard QA tasks, this benchmark requires the LLM to act as an agent that translates high-level user intents (e.g., "Find my backpack lost between the library and the gym") into precise physical sensor node IDs within a large-scale digital twin.
Key Features
datasetslibrary to load data directly from Hugging Face.TopologyManagerthat simulates a retrieval system. It dynamically fetches relevant building/floor topological data based on the user query, testing the model's ability to reason over long contexts and spatial constraints.TopoSenseEvaluatorto robustly parse and match sensor Node IDs (e.g.,teaching_building_1_camera_03) against ground truth.📊 Dataset Statistics
Implementation Details
benchmarks/toposense_bench/).sdk.executor.SimpleExecutorfor LLM calls andsdk.utilsfor configuration management.env.toml(template provided).How to Run
Navigate to the benchmark directory:
cd benchmarks/toposense_benchInstall dependencies:
Configure
env.tomlwith your API Key (e.g.,OPENAI_API_KEY).Run the evaluation script: