-
Notifications
You must be signed in to change notification settings - Fork 12
feat(data): add ESIM building data pipeline #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@Jc-965 is attempting to deploy a commit to the ScottyLabs Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughIntroduces a new ESIM building data pipeline (ArcGIS query, OSM parsing, sign-abbrev mapping, FMS enrichment), adds multiple new scraper scripts, and updates apps/dataflow Python requirement to 3.12 with three new dependencies. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Pipeline as building_pipeline.py
participant ArcGIS as arc_gis_query.py
participant Overpass as osm_building_to_json.py
participant SignMap as sign_abbrev_mapping.py
participant FMS as add_fms_id.py
User->>Pipeline: run CLI
Pipeline->>ArcGIS: execute (unless --skip-arcgis)
ArcGIS->>ArcGIS: query layer, normalize, write query.json
ArcGIS-->>Pipeline: done
Pipeline->>Overpass: run OSM parsing subprocess (env vars)
Overpass->>Overpass: fetch OSM (Overpass), parse XML, compute geometry
Overpass->>Overpass: write parsed_buildings.json and mapping
Overpass-->>Pipeline: done
Pipeline->>SignMap: run sign_abbrev_mapping (query.json)
SignMap->>SignMap: build sign→ID map, write sign_abbrev_mapping.json
SignMap-->>Pipeline: done
Pipeline->>FMS: run add_fms_id (buildings + mapping)
FMS->>FMS: lookup/insert fmsId, write updated buildings.json
FMS-->>Pipeline: done
Pipeline-->>User: complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@apps/dataflow/src/scraper/esim/building_pipeline.py`:
- Around line 51-112: The current --dry-run (declared in parse_args) only
toggles add_fms_id.py but main still sets CMUMAPS_PARSED_BUILDINGS_OUTPUT to the
real output file and runs osm_building_to_json.py which will overwrite output;
change main so that when args.dry_run is true it routes parsed output to a
temporary file instead of output_file (e.g., create a temp path and set
env["CMUMAPS_PARSED_BUILDINGS_OUTPUT"]=temp_path) and ensure add_fms_id.py is
still invoked with --dry-run; alternatively, update the
parser.add_argument("--dry-run", ...) help text to clearly state it does not
prevent writing the parsed buildings file — pick one fix and apply it
consistently in main (referencing args.dry_run, CMUMAPS_PARSED_BUILDINGS_OUTPUT,
osm_building_to_json.py and add_fms_id.py).
In `@apps/dataflow/src/scraper/esim/osm_building_to_json.py`:
- Around line 92-100: The osmId value retrieved from data (osm_id =
data.get("osmId")) must be normalized to a string before using it as a key in
osm_id_to_info so later string-based lookups succeed; change the insertion to
coerce osm_id to str (e.g., use str(osm_id)) and only add the entry when osm_id
is not None/empty, keeping the same value mapping ("code", "name",
"defaultFloor", "floors") and references to osm_id_to_info, osm_id, and
data.get("osmId") in the osm_building_to_json logic.
- Around line 519-578: The _collect_buildings function exceeds Ruff
complexity/branch limits; split logic into smaller helpers or explicitly
suppress the rule. Refactor by moving the relation-processing block into a
helper like _process_relation(rel, osm_id_to_info, assigned_codes,
outer_ways_used, buildings) that extracts outer refs, builds shapes via
_shape_from_way, and calls _assemble_entry, and move the single-way loop into
_process_way(wid, way, osm_id_to_info, assigned_codes, outer_ways_used,
buildings) that handles matching info and assembling entries; ensure both
helpers update buildings and assigned_codes and reference _shape_from_way and
_assemble_entry. If refactoring isn't feasible now, add "# noqa: C901, PLR0912"
to the _collect_buildings definition line.
🧹 Nitpick comments (1)
apps/dataflow/src/scraper/esim/building_pipeline.py (1)
66-93: Fail fast if--downloaded-buildingsis missing.Line 68 uses the file as input, but there’s no preflight check. A clearer error here avoids a less actionable failure inside the OSM subprocess.
✅ Suggested preflight check
osm_file = Path(args.osm_file).resolve() downloaded_buildings = Path(args.downloaded_buildings).resolve() output_file = Path(args.output).resolve() + if not downloaded_buildings.is_file(): + logger.error("--downloaded-buildings %s not found.", downloaded_buildings) + sys.exit(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@apps/dataflow/src/scraper/esim/osm_building_to_json.py`:
- Around line 29-30: Add inline mypy suppression comments to the untyped
third-party imports to silence type checking errors: append " # type: ignore"
to the import of defusedxml.ElementTree (imported as ET) and to the import of
overpass so mypy will ignore missing type stubs for those libraries.
🧹 Nitpick comments (1)
apps/dataflow/src/scraper/esim/osm_building_to_json.py (1)
478-486: Consider guarding against malformed coordinates.The
float()conversion on line 481 could raiseValueErrorif lat/lon attributes contain non-numeric values. While rare in OSM data, defensive handling would improve robustness.🛡️ Optional defensive fix
for n in root.findall("node"): nid = n.attrib.get("id") if nid and "lat" in n.attrib and "lon" in n.attrib: - nodes[nid] = (float(n.attrib["lat"]), float(n.attrib["lon"])) + try: + nodes[nid] = (float(n.attrib["lat"]), float(n.attrib["lon"])) + except ValueError: + logger.warning("Skipping node %s with invalid coordinates", nid) + continue
| import defusedxml.ElementTree as ET # noqa: N817 | ||
| import overpass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type: ignore comments to silence mypy errors for untyped libraries.
The pipeline is failing due to missing type stubs for defusedxml and overpass. Since these libraries don't provide type stubs, suppress with inline comments.
🔧 Proposed fix
-import defusedxml.ElementTree as ET # noqa: N817
-import overpass
+import defusedxml.ElementTree as ET # noqa: N817 # type: ignore[import-untyped]
+import overpass # type: ignore[import-untyped]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import defusedxml.ElementTree as ET # noqa: N817 | |
| import overpass | |
| import defusedxml.ElementTree as ET # noqa: N817 # type: ignore[import-untyped] | |
| import overpass # type: ignore[import-untyped] |
🧰 Tools
🪛 GitHub Actions: Check
[error] 29-29: mypy: Library stubs not installed for "defusedxml.ElementTree". (import-untyped)
[error] 29-29: mypy: Library stubs not installed for "defusedxml". (import-untyped)
[error] 30-30: mypy: Skipping analyzing "overpass": module is installed, but missing library stubs or py.typed marker (import-untyped)
🤖 Prompt for AI Agents
In `@apps/dataflow/src/scraper/esim/osm_building_to_json.py` around lines 29 - 30,
Add inline mypy suppression comments to the untyped third-party imports to
silence type checking errors: append " # type: ignore" to the import of
defusedxml.ElementTree (imported as ET) and to the import of overpass so mypy
will ignore missing type stubs for those libraries.
Summary
Add a new ESIM building data pipeline for fetching, processing, and adding CMU building data from ArcGIS and OpenStreetMap sources.
Pipeline Overview
The pipeline consists of 4 scripts that run in sequence:
arc_gis_query.py - Fetches building data from CMU's ArcGIS Building layer
query.jsonsign_abbrev_mapping.py - Creates lookup table from sign abbreviations to FMS IDs
query.jsonsign_abbrev_mapping.jsonosm_building_to_json.py - Uses Python wrapper for the OpenStreetMap Overpass API to get OSM export, which is used to extract building geometries, osmIds, shapes, hitboxes, and entrances
buildings.json(used for name mappings and structure)buildings.json(Added osmId, shapes, hitboxes),building_info_map.json,export.osmadd_fms_id.py - Adds FMS identifiers to building records by matching codes/names
buildings.json,sign_abbrev_mapping.jsonbuildings.json(updated in-place with fmsId field)Required File
buildings.json- Building name mappings and base structureUsage
cd apps/dataflow/src/scraper/esim python building_pipeline.pyOr run individual scripts:
Summary by CodeRabbit
Chores
New Features
✏️ Tip: You can customize this high-level summary in your review settings.