feat: web driver & util to scrape and upload fms floorplan #263

tu2463 · 2026-01-31T22:56:50Z

Description

This PR adds a scraping pipeline to collect floor plan SVGs from FMSystems.

It adds a Selenium-based web driver that opens FMSystems and programmatically clicks through each floor of the specified buildings. This step ensures that the required cookies are initialized for every floor, which is necessary for successfully retrieving the corresponding SVG data.

Once the cookies are set, a separate script uses the request headers and cookies of your browser to fetch and store the SVG files.

It also adds scripts that uploads to and downloads from S3.

Changes

driver.py: A Selenium web driver that opens FMSystems and clicks through all floors for the selected buildings to initialize cookies.
get_all_svg.py: Iterates over buildings defined in buildings.json, retrieves their floor plan SVGs, and saves them locally.
Notes
upload_svg_to_s3.py: Uploads the floorplan_svg folder with all content to S3.
download_svg_from_s3.py: Downloads the floorplan_svg folder from S3.
s3_utils.py: copied from CMU Maps data acquisitor repo

The following floors cannot be successfully scraped because they do not have a "draw file":
Hall of Arts: 4th floor
Posner Hall: 1M

The following buildings are not found in FM Systems:

forbes-ave-4612
sh (possibly Scaife Hall, but the floor ids of the "Alan Magee Scaife Hall of Engineering" on FMS are different from the ones we scraped)
sterling-plaza
wf (unknown building)

How to ### Test

Download buildings.json and place it in the fms folder
In driver.py, ensure the building_for_process array contains the labels of all buildings you want to scrape.
Run driver.py.
A Chrome window will open with the FMSystems login page. Log in using your Andrew ID.
Once the FMSystems main page loads, press Enter in the terminal.
Wait for the script to click through all floors (~10 seconds per floor).

After all clicks finish:
8. Open Chrome DevTools → Network panel.
9. Manually click any floor.
10. Locate the getDefaultLayersData request.
11. Copy its cURL command.
12. Convert the cURL command using a cURL to code converter.
13. Copy the cookies and headers into get_all_svg.py.
14. Run get_all_svg.py.
15. Confirm that the SVG files are saved in the floorplan_svg/ directory.

Summary by CodeRabbit

New Features
- Added end-to-end SVG workflows: bulk download from storage, bulk upload to storage, and a CLI to fetch campus floor-plan SVGs with retries and configurable options.
- Added browser automation to navigate the FMSystems UI for building/floor extraction.
- Added storage utilities for uploading/downloading files and JSON, with console progress and summary reporting.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

vercel · 2026-01-31T22:56:56Z

@tu2463 is attempting to deploy a commit to the ScottyLabs Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-01-31T22:57:08Z

📝 Walkthrough

Walkthrough

Adds a new FMS scraping toolset: Selenium UI driver, HTTP SVG fetcher with retry/backoff, MinIO/S3 utility, and scripts to upload/download floor-plan SVGs between local storage and the cmumaps S3 bucket.

Changes

Cohort / File(s)	Summary
S3 utilities & upload/download scripts `apps/dataflow/src/scraper/fms/s3_utils.py`, `apps/dataflow/src/scraper/fms/upload_svg_to_s3.py`, `apps/dataflow/src/scraper/fms/download_svg_from_s3.py`	New MinIO/S3 client wrapper (env-configured for bucket `cmumaps`) with JSON and generic file upload/download, listing, and helpers; scripts to upload local SVGs from `floorplan_svg` and to download `.svg` objects from S3 prefix `floorplan_svg/` to a local folder.
SVG fetching (HTTP) and CLI `apps/dataflow/src/scraper/fms/get_all_svg.py`	New CLI that reads `buildings.json`, slugifies building names, builds CMU API requests per floor, fetches SVGs with retry/exponential backoff and configurable SSL/timeouts, and saves SVGs into slugified directories.
Selenium driver / UI automation `apps/dataflow/src/scraper/fms/driver.py`	New Selenium-based Chrome WebDriver flow that navigates to LANDING_PAGE_URL, waits for manual login, expands Sites/Main Campus and buildings, opens floors (new tabs), handles modal alerts, and exposes helpers (`click_building_and_all_floors`, `js_click`, `open_sites_and_expand_main_campus`, `expand_building`).

Sequence Diagram(s)

sequenceDiagram
  participant CLI as "get_all_svg CLI"
  participant CMU as "CMU SVG API"
  participant FS as "Local filesystem"
  participant S3 as "MinIO / S3 (cmumaps)"
  CLI->>CMU: request floorplan SVG (with retries/backoff)
  CMU-->>CLI: SVG response or error
  CLI->>FS: save SVG to slugified path
  FS-->>CLI: write confirmation
  CLI->>S3: optional upload via s3_utils.upload_generic_file
  S3-->>CLI: upload success/failure

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇
I hopped through tabs and API streams,
Collected floors and SVG dreams,
Pushed the files where buckets gleam,
Pulled them back to sight and scheme,
A little rabbit in every beams.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: introducing a web driver for Selenium automation and utilities for scraping and uploading FMS floorplan SVGs to S3.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

🤖 Fix all issues with AI agents

In `@apps/dataflow/src/scraper/fms/download_svg_from_s3.py`:
- Around line 3-34: The download_all_svg_files function currently maps S3 keys
directly to local_file_path which allows path traversal via "../"; normalize and
validate the mapped path before creating dirs or writing files by resolving the
destination root (LOCAL_DESTINATION) to an absolute/resolved path and resolving
the candidate path (join LOCAL_DESTINATION with relative_path then resolve),
then ensure the resolved candidate path starts with the resolved destination
root; if it does not, skip the object (or sanitize the name) and increment
fail_count; apply this check where obj.object_name -> relative_path ->
local_file_path is computed and before os.makedirs or any file write.

In `@apps/dataflow/src/scraper/fms/driver.py`:
- Around line 74-93: The code assumes window.open succeeded and immediately uses
driver.window_handles[-1] which can equal the main handle if popups are blocked,
causing driver.close() to kill the session; before switching or closing, capture
the current handles (e.g., home_window = driver.current_window_handle and
orig_handles = driver.window_handles), then after execute_script use a short
WebDriverWait loop (or poll) to ensure a new handle appears
(len(driver.window_handles) > len(orig_handles)); if a new handle never appears,
skip switching/alert handling/closing for this iteration and continue, otherwise
switch to the newly added handle, perform the alert check, close that new
handle, and switch back to home_window (use unique symbols like
driver.execute_script, driver.window_handles, driver.switch_to.window,
driver.close, WebDriverWait/TimeoutException to locate where to update).

In `@apps/dataflow/src/scraper/fms/get_all_svg.py`:
- Around line 28-75: The file currently exposes real auth data in the cookies
and headers dicts; remove the hard‑coded cookies and headers (the cookies and
headers variables) and instead load them at runtime from a local secrets source
(e.g., environment variables or a JSON file read by a helper like
load_secrets()) and fail fast if missing; ensure the secret file name is added
to .gitignore and update any README/dev docs to instruct how to create the local
secrets file or set env vars; after updating remove the sensitive values from
history and coordinate rotation/revocation of the exposed tokens.
- Around line 96-145: Change SSL verification to be enabled by default: in
fetch_svg() set the parameter verify_ssl: bool = True (instead of False). In
main(), replace the CLI flag --verify-ssl with an opt-out --insecure flag
(action="store_true") and update the call that invokes fetch_svg(...) to pass
verify_ssl=not args.insecure so verification is enabled unless --insecure is
provided; ensure references to verify_ssl in requests.get(...,
verify=verify_ssl) remain consistent.
- Around line 115-126: The response is written to out_file as an .svg without
validating that it is actually SVG; update the logic in the function using resp
(the requests.get block) to verify the payload before calling
out_file.write_text: check resp.headers.get("Content-Type") for an SVG MIME type
(e.g., contains "image/svg+xml") or ensure resp.text.lstrip().startswith("<svg")
and only then create the parent dir and write the file, otherwise treat it as an
error/expired-session case (return False or raise) so stale HTML/login pages are
not saved as .svg.

In `@apps/dataflow/src/scraper/fms/s3_utils.py`:
- Around line 1-5: The code that writes a downloaded S3 object to local JSON
must ensure the parent directory exists before calling open/write; update the
function that saves the S3 object (the routine that takes s3_object_name and
writes to local_path / local_filename—e.g., the save/download function around
lines 112-118) to call os.path.dirname(local_path) and, if non-empty, create it
with os.makedirs(..., exist_ok=True) prior to opening the file; keep the
existing write logic and handle the case where s3_object_name has no path
component by skipping directory creation.
- Around line 68-88: The upload_generic_file function currently returns True on
success but returns None on exception; update the except block in
upload_generic_file to explicitly return False (and keep the existing print of
the exception) so the function consistently returns a boolean as documented;
reference the upload_generic_file function and the except Exception as e block
to locate where to add the return False.
- Around line 201-221: The docstring in get_generic_file_from_s3 has misaligned
continuation lines; fix the indentation so each wrapped line aligns to a
multiple-of-2 spaces under the docstring start (e.g., align the "return_data"
continuation line with two-space increments to match the surrounding docstring
block), ensuring the triple-quoted description and Args/Returns sections use
consistent 2-space indentation to satisfy editorconfig.
- Around line 8-17: The code currently constructs a Minio client without
checking s3_endpoint; before calling Minio(...) validate that s3_endpoint (the
S3_ENDPOINT env var) is non-empty and raise or log a clear error (e.g.,
ValueError with a message like "S3_ENDPOINT must be set") so initialization of
client = Minio(...) only occurs when s3_endpoint is present; reference the
variables access_key, secret_key, s3_endpoint and the Minio client creation in
your change.

In `@apps/dataflow/src/scraper/fms/upload_svg_to_s3.py`:
- Around line 17-23: The code calls os.listdir(LOCAL_SVG_FOLDER) which will
raise if LOCAL_SVG_FOLDER doesn't exist; before the loop check that
LOCAL_SVG_FOLDER exists and is a directory (using os.path.exists or
os.path.isdir) and if not, fail fast by raising a clear exception or logging an
error and exiting; update the block surrounding the os.listdir call (the loop
that iterates over building in os.listdir(LOCAL_SVG_FOLDER)) to perform this
pre-check and surface a descriptive message including LOCAL_SVG_FOLDER so
downstream callers know why processing stopped.

🧹 Nitpick comments (1)

apps/dataflow/src/scraper/fms/driver.py (1)

97-129: Pass driver into js_click to avoid hidden global state.
This makes the helper reusable/testable and avoids accidental cross‑driver usage.

♻️ Refactor to make `driver` explicit

-def js_click(el):
-    driver.execute_script("arguments[0].scrollIntoView({block:'center'});", el)
-    driver.execute_script("arguments[0].click();", el)
+def js_click(driver, el):
+    driver.execute_script("arguments[0].scrollIntoView({block:'center'});", el)
+    driver.execute_script("arguments[0].click();", el)
@@
-        js_click(campus_plus[0])
+        js_click(driver, campus_plus[0])
@@
-    js_click(campus_plus)
+    js_click(driver, campus_plus)

coderabbitai · 2026-01-31T23:09:14Z

apps/dataflow/src/scraper/fms/download_svg_from_s3.py

+import os
+from s3_utils import client, bucket_name
+
+# S3 source path
+S3_SOURCE = "floorplan_svg/"
+# Local destination folder
+LOCAL_DESTINATION = "floorplan_svg_downloaded"
+
+
+def download_all_svg_files():
+    """Download all SVG files from S3 floorplan_svg folder"""
+    success_count = 0
+    fail_count = 0
+
+    # Create local destination folder if it doesn't exist
+    os.makedirs(LOCAL_DESTINATION, exist_ok=True)
+
+    # List all objects in the floorplan_svg folder
+    objects = client.list_objects(bucket_name, prefix=S3_SOURCE, recursive=True)
+
+    for obj in objects:
+        # Skip if not an SVG file
+        if not obj.object_name.endswith(".svg"):
+            continue
+
+        # Extract relative path (e.g., "floorplan_svg/building/floor.svg" -> "building/floor.svg")
+        relative_path = obj.object_name[len(S3_SOURCE):]
+        local_file_path = os.path.join(LOCAL_DESTINATION, relative_path)
+
+        # Create building subfolder if needed
+        os.makedirs(os.path.dirname(local_file_path), exist_ok=True)
+


⚠️ Potential issue | 🟠 Major

Prevent path traversal when mapping S3 keys to local files.
S3 object names can contain ../; without normalization, this can write outside LOCAL_DESTINATION.

🔐 Constrain writes to the destination directory

-import os +import os +from pathlib import Path @@ - for obj in objects: + base_dir = Path(LOCAL_DESTINATION).resolve() + for obj in objects: @@ - relative_path = obj.object_name[len(S3_SOURCE):] - local_file_path = os.path.join(LOCAL_DESTINATION, relative_path) + relative_path = obj.object_name[len(S3_SOURCE):].lstrip("/") + local_file_path = (base_dir / relative_path).resolve() + try: + local_file_path.relative_to(base_dir) + except ValueError: + print(f"[warn] Skipping suspicious key: {obj.object_name}") + continue @@ - os.makedirs(os.path.dirname(local_file_path), exist_ok=True) + local_file_path.parent.mkdir(parents=True, exist_ok=True) @@ - client.fget_object(bucket_name, obj.object_name, local_file_path) + client.fget_object(bucket_name, obj.object_name, str(local_file_path))

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 13-13: First line should end with a period

Add period

(D400)

[warning] 13-13: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

[warning] 18-18: os.makedirs() should be replaced by Path.mkdir(parents=True)

(PTH103)

[warning] 30-30: os.path.join() should be replaced by Path with / operator

(PTH118)

[warning] 33-33: os.makedirs() should be replaced by Path.mkdir(parents=True)

(PTH103)

[warning] 33-33: os.path.dirname() should be replaced by Path.parent

(PTH120)

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/download_svg_from_s3.py` around lines 3 - 34, The download_all_svg_files function currently maps S3 keys directly to local_file_path which allows path traversal via "../"; normalize and validate the mapped path before creating dirs or writing files by resolving the destination root (LOCAL_DESTINATION) to an absolute/resolved path and resolving the candidate path (join LOCAL_DESTINATION with relative_path then resolve), then ensure the resolved candidate path starts with the resolved destination root; if it does not, skip the object (or sanitize the name) and increment fail_count; apply this check where obj.object_name -> relative_path -> local_file_path is computed and before os.makedirs or any file write.

coderabbitai · 2026-01-31T23:09:14Z

apps/dataflow/src/scraper/fms/driver.py

+        print(f"Opening floor {label} ({idx+1}/{len(floor_links)}) -> {href}")
+        # driver.get(href) # open url in same tab
+        # open floor in new tab
+        driver.execute_script("window.open(arguments[0], '_blank');", href)
+        driver.switch_to.window(driver.window_handles[-1]) # Note: assumes new tab is last. Could be wrong if popups occur.
+
+        # Handle possible modal alert: "Drawing file is not available!"
+        try:
+            WebDriverWait(driver, 1).until(EC.alert_is_present())
+            alert = driver.switch_to.alert
+            msg = alert.text
+            alert.accept()
+            print(f"Skipped floor {label or '(no label)'} due to alert: {msg}")
+        except TimeoutException:
+            # No alert -> give the site a moment to “warm” server-side state
+            time.sleep(0.5)
+
+        # close tab and return to main page
+        driver.close()
+        driver.switch_to.window(home_window)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n apps/dataflow/src/scraper/fms/driver.py | sed -n '70,100p'

Repository: ScottyLabs/cmumaps

Length of output: 1613

🏁 Script executed:

cat -n apps/dataflow/src/scraper/fms/driver.py | sed -n '50,95p'

Repository: ScottyLabs/cmumaps

Length of output: 2234

🏁 Script executed:

python3 << 'EOF' # Research on Selenium window handle behavior info = """ Selenium window.open() behavior: 1. execute_script() with window.open() is a JavaScript call that: - Returns immediately (async operation) - Can fail silently if blocked by browser security policies - Does not raise an exception if opening fails - Only adds to window_handles if actually created 2. window_handles list: - Contains all open windows/tabs at that moment - Does not automatically refresh - Must be accessed fresh each time to get updated list - window_handles[-1] assumes the last element is the new tab 3. Common failure scenarios: - Content Security Policy (CSP) blocking window.open - Browser security policies on certain domains - Target origin security restrictions - WebDriver security sandboxing 4. The vulnerability: - If window.open() fails silently, window_handles unchanged - window_handles[-1] still references main window - driver.close() closes main window instead of new tab - Next iteration attempts to interact with closed window (error) 5. The proposed fix correctly: - Captures handles_before as set() - Waits for new handle to appear - Raises TimeoutException if no new handle appears - Uses identified new_handles to switch to correct window - Skips close when no new window was created """ print(info) EOF

Repository: ScottyLabs/cmumaps

Length of output: 1352

Guard against missing new‑tab handles before closing.
If window.open() fails or is blocked by browser security policies, window_handles[-1] can still reference the main tab and driver.close() will terminate the session. Wait for a new handle to appear and skip the close operation if none materializes.

🔧 Proposed guard for new‑tab handle creation

- driver.execute_script("window.open(arguments[0], '_blank');", href) - driver.switch_to.window(driver.window_handles[-1]) # Note: assumes new tab is last. Could be wrong if popups occur. + handles_before = set(driver.window_handles) + driver.execute_script("window.open(arguments[0], '_blank');", href) + try: + WebDriverWait(driver, 5).until(lambda d: len(d.window_handles) > len(handles_before)) + except TimeoutException: + print(f"[warn] No new tab opened for {label or '(no label)'}; skipping.") + continue + new_handles = [h for h in driver.window_handles if h not in handles_before] + driver.switch_to.window(new_handles[-1])

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 74-74: print found

Remove print

(T201)

[error] 75-75: Found commented-out code

Remove commented-out code

(ERA001)

[warning] 86-86: print found

Remove print

(T201)

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/driver.py` around lines 74 - 93, The code assumes window.open succeeded and immediately uses driver.window_handles[-1] which can equal the main handle if popups are blocked, causing driver.close() to kill the session; before switching or closing, capture the current handles (e.g., home_window = driver.current_window_handle and orig_handles = driver.window_handles), then after execute_script use a short WebDriverWait loop (or poll) to ensure a new handle appears (len(driver.window_handles) > len(orig_handles)); if a new handle never appears, skip switching/alert handling/closing for this iteration and continue, otherwise switch to the newly added handle, perform the alert check, close that new handle, and switch back to home_window (use unique symbols like driver.execute_script, driver.window_handles, driver.switch_to.window, driver.close, WebDriverWait/TimeoutException to locate where to update).

coderabbitai · 2026-01-31T23:09:14Z

apps/dataflow/src/scraper/fms/get_all_svg.py

+import argparse
+import json
+from pathlib import Path
+import re
+import time
+from typing import Dict, Any, List, Optional
+import requests
+
+ENDPOINT = "https://fmsystems.cmu.edu/FMInteract/tools/getDefaultLayersData.ashx"
+DEFAULT_PARAMS = {
+    "isRevit": "false",
+    "RoomBoundaryLayer": "A-AREA",
+    "RoomTagLayer": "A-AREA-IDEN",
+}
+
+# Paste your session cookies and headers here (copied from your browser)
+cookies = {
+    'ASP.NET_SessionId': 'sadei2rcwbhr1m1bdzhmyejm',
+    '.ASPXAUTH': '6A464769C85B50B714FB55EED49E05219CB95E65E6BB946700F5FC627B8BC61F8238B4DB27B649F72C89B11E2A861D4EA9DBF4435EBD01C9E376DB64A831D29A5958CFEFCC6F24555591677D837DE81B93A42DEA',
+    'AuthenToken': 'e097af9e-1d96-41da-8461-dd6d68fde4e2',
+    '.ASPXROLES': '51oAsjfC8TogP4H6dpTFEHZ95_fxHWVjx3PpE2LLMpwm3TMl5xLf40ovYnXyH3WXPzlseJC7PFMJ_zghHwLxaOekFUKlMK7lIPsZ8Ke-g0bAPWE6I8cuwI2LmbaGDEtrZgCcT988NnrSZ-bGVM226apgdQ4VRfQspcZtyZFfRJ3gLTdwT3tkT9li_f1tP5JztF5xbMAYoiBpBpJ9e_OXDwBxRYD58ScEKavkfqbjA03ivo6WonCfIlMYE2QsOBKEqgPBPciUiyoY4ywp_f4DlgP-TW4WIISKQ9626T7EAqA3eKfWw-hYjkH6zDvnMeFw4vLv_fwkmOScoHqZliHdKmbpiKFQ1BMnoMKk92bR72QjRwAZUhqqlLkqAhkMMJL03Z4rIMH3GZ4l_eTNdZfEpvsZwBM2o9Uczhn55JNxe6AVL7e60vcojVEO9IyWeqjSx6hrfeMFCj1tK9psYyOFaki9FTW4RQvo4MxCP8Y9WBekJ2ZP0',
+    'NavCookies_ANONYMOUS': '1',
+    'NavCookies_Theme_ANONYMOUS': 'Default',
+    'IsShowNavigation': 'true',
+    'previousUrl': 'https://fmsystems.cmu.edu/FMInteract/ShowDrawingView.aspx?file_code=L0&bldgcode%20floorcode%20optn=116%20%20%20%20%20%20%201%20%20%20&bldgcode=116%20%20%20%20%20%20%20&floorcode=1%20%20%20&act_code=Q',
+    'State': 'unpinNav',
+    'scrollPosition': '1826.5',
+    'MenuState': 'Action=SITE&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&sitecode=MAIN*Action=BUILDING&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&BLDGCODE=016*Action=BUILDING&BLDGCODE=009*Action=BUILDING&BLDGCODE=002*Action=BUILDING&BLDGCODE=080*Action=BUILDING&BLDGCODE=190*Action=BUILDING&BLDGCODE=195*Action=BUILDING&BLDGCODE=280*Action=BUILDING&BLDGCODE=025*Action=BUILDING&BLDGCODE=141*Action=BUILDING&BLDGCODE=012*',
+    'R_SplitterHeight': '494',
+    '__AntiXsrfToken': 'f185e23462ed4be7bf12656cf127454c',
+}
+
+headers = {
+    'accept': '*/*',
+    'accept-language': 'en-US,en;q=0.9',
+    'cache-control': 'no-cache',
+    'priority': 'u=1, i',
+    'referer': 'https://fmsystems.cmu.edu/FMInteract/DrawingView.aspx?file_code=L0&bldgcode%20floorcode%20optn=116%20%20%20%20%20%20%201%20%20%20&bldgcode=116%20%20%20%20%20%20%20&floorcode=1%20%20%20&act_code=Q&maxColunms=4',
+    'sec-ch-ua': '"Not(A:Brand";v="8", "Chromium";v="144", "Google Chrome";v="144"',
+    'sec-ch-ua-mobile': '?0',
+    'sec-ch-ua-platform': '"macOS"',
+    'sec-fetch-dest': 'empty',
+    'sec-fetch-mode': 'cors',
+    'sec-fetch-site': 'same-origin',
+    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36',
+    'x-requested-with': 'XMLHttpRequest',
+    # 'cookie': 'ASP.NET_SessionId=sadei2rcwbhr1m1bdzhmyejm; .ASPXAUTH=6A464769C85B50B714FB55EED49E05219CB95E65E6BB946700F5FC627B8BC61F8238B4DB27B649F72C89B11E2A861D4EA9DBF4435EBD01C9E376DB64A831D29A5958CFEFCC6F24555591677D837DE81B93A42DEA; AuthenToken=e097af9e-1d96-41da-8461-dd6d68fde4e2; .ASPXROLES=51oAsjfC8TogP4H6dpTFEHZ95_fxHWVjx3PpE2LLMpwm3TMl5xLf40ovYnXyH3WXPzlseJC7PFMJ_zghHwLxaOekFUKlMK7lIPsZ8Ke-g0bAPWE6I8cuwI2LmbaGDEtrZgCcT988NnrSZ-bGVM226apgdQ4VRfQspcZtyZFfRJ3gLTdwT3tkT9li_f1tP5JztF5xbMAYoiBpBpJ9e_OXDwBxRYD58ScEKavkfqbjA03ivo6WonCfIlMYE2QsOBKEqgPBPciUiyoY4ywp_f4DlgP-TW4WIISKQ9626T7EAqA3eKfWw-hYjkH6zDvnMeFw4vLv_fwkmOScoHqZliHdKmbpiKFQ1BMnoMKk92bR72QjRwAZUhqqlLkqAhkMMJL03Z4rIMH3GZ4l_eTNdZfEpvsZwBM2o9Uczhn55JNxe6AVL7e60vcojVEO9IyWeqjSx6hrfeMFCj1tK9psYyOFaki9FTW4RQvo4MxCP8Y9WBekJ2ZP0; NavCookies_ANONYMOUS=1; NavCookies_Theme_ANONYMOUS=Default; IsShowNavigation=true; previousUrl=https://fmsystems.cmu.edu/FMInteract/ShowDrawingView.aspx?file_code=L0&bldgcode%20floorcode%20optn=116%20%20%20%20%20%20%201%20%20%20&bldgcode=116%20%20%20%20%20%20%20&floorcode=1%20%20%20&act_code=Q; State=unpinNav; scrollPosition=1826.5; MenuState=Action=SITE&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&act_code=Q&sitecode=MAIN*Action=BUILDING&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&sitecode=MAIN&BLDGCODE=016*Action=BUILDING&BLDGCODE=009*Action=BUILDING&BLDGCODE=002*Action=BUILDING&BLDGCODE=080*Action=BUILDING&BLDGCODE=190*Action=BUILDING&BLDGCODE=195*Action=BUILDING&BLDGCODE=280*Action=BUILDING&BLDGCODE=025*Action=BUILDING&BLDGCODE=141*Action=BUILDING&BLDGCODE=012*; R_SplitterHeight=494; __AntiXsrfToken=f185e23462ed4be7bf12656cf127454c',
+}


⚠️ Potential issue | 🔴 Critical

Remove hard‑coded session cookies/headers from source control.
These are live auth tokens and must not be committed. Load them from a local file/env at runtime, add the files to .gitignore, and rotate/revoke the exposed tokens.

🔒 Example: load secrets from local JSON files

-import argparse -import json -from pathlib import Path -import re -import time -from typing import Dict, Any, List, Optional -import requests +import argparse +import json +import os +from pathlib import Path +import re +import time +from typing import Dict, Any, List, Optional +import requests @@ -# Paste your session cookies and headers here (copied from your browser) -cookies = { - 'ASP.NET_SessionId': '...', - # ... -} - -headers = { - 'accept': '*/*', - # ... -} +def _load_json(path: str) -> Dict[str, str]: + with open(path, "r", encoding="utf-8") as f: + return json.load(f) + +COOKIES_PATH = os.getenv("FMS_COOKIES_PATH", "cookies.json") +HEADERS_PATH = os.getenv("FMS_HEADERS_PATH", "headers.json") +cookies = _load_json(COOKIES_PATH) +headers = _load_json(HEADERS_PATH)

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 33-33: typing.Dict is deprecated, use dict instead

(UP035)

[warning] 33-33: typing.List is deprecated, use list instead

(UP035)

[error] 74-74: Found commented-out code

Remove commented-out code

(ERA001)

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/get_all_svg.py` around lines 28 - 75, The file currently exposes real auth data in the cookies and headers dicts; remove the hard‑coded cookies and headers (the cookies and headers variables) and instead load them at runtime from a local secrets source (e.g., environment variables or a JSON file read by a helper like load_secrets()) and fail fast if missing; ensure the secret file name is added to .gitignore and update any README/dev docs to instruct how to create the local secrets file or set env vars; after updating remove the sensitive values from history and coordinate rotation/revocation of the exposed tokens.

coderabbitai · 2026-01-31T23:09:14Z

apps/dataflow/src/scraper/fms/get_all_svg.py

+def fetch_svg(
+    floor_id: str,
+    svg_filename_param: str,
+    out_file: Path,
+    cookies: Dict[str, str],
+    headers: Dict[str, str],
+    timeout: float = 30.0,
+    retries: int = 3,
+    backoff: float = 1.5,
+    verify_ssl: bool = False,
+) -> bool:
+    params = dict(DEFAULT_PARAMS)
+    params["floorId"] = floor_id
+    params["svgFile"] = svg_filename_param
+    print(params)
+
+    last_exc = None
+    for attempt in range(1, retries + 1):
+        try:
+            resp = requests.get(
+                ENDPOINT,
+                params=params,
+                cookies=cookies,
+                headers=headers,
+                timeout=timeout,
+                verify=verify_ssl,
+            )
+            if resp.status_code == 200 and resp.text.strip():
+                out_file.parent.mkdir(parents=True, exist_ok=True)
+                out_file.write_text(resp.text, encoding="utf-8")
+                return True
+            else:
+                print(f"[warn] HTTP {resp.status_code} for floorId={floor_id}; attempt {attempt}/{retries}")
+        except Exception as e:
+            last_exc = e
+            print(f"[warn] Exception on attempt {attempt}/{retries} for floorId={floor_id}: {e}")
+        time.sleep(backoff ** (attempt - 1))
+    if last_exc:
+        print(f"[error] Failed for floorId={floor_id}: {last_exc}")
+    return False
+
+def main():
+    ap = argparse.ArgumentParser(description="Fetch floor-plan SVGs for buildings/floorid.")
+    ap.add_argument("--buildings", default="buildings.json", help="Path to buildings.json (list of {building, floorid})")
+    ap.add_argument("--out", default="floorplan_svg", help="Output root directory")
+    ap.add_argument("--delay", type=float, default=0.0, help="Seconds to sleep between requests (politeness)")
+    ap.add_argument("--timeout", type=float, default=30.0, help="HTTP timeout per request in seconds")
+    ap.add_argument("--retries", type=int, default=3, help="HTTP retries per request")
+    ap.add_argument("--backoff", type=float, default=1.5, help="Exponential backoff base between retries")
+    ap.add_argument("--verify-ssl", action="store_true", help="Verify SSL certificates (default: off)")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the file exists and its size wc -l apps/dataflow/src/scraper/fms/get_all_svg.py

Repository: ScottyLabs/cmumaps

Length of output: 110

🏁 Script executed:

# Read the relevant sections of the file to verify current state head -150 apps/dataflow/src/scraper/fms/get_all_svg.py | tail -60

Repository: ScottyLabs/cmumaps

Length of output: 2592

🏁 Script executed:

# Check lines 218-228 mentioned in the review sed -n '210,235p' apps/dataflow/src/scraper/fms/get_all_svg.py

Repository: ScottyLabs/cmumaps

Length of output: 1265

🏁 Script executed:

# Check the full file for any comments about SSL verification head -100 apps/dataflow/src/scraper/fms/get_all_svg.py

Repository: ScottyLabs/cmumaps

Length of output: 6519

🏁 Script executed:

# Search for any environment or configuration that might explain the SSL verification choice grep -n "verify" apps/dataflow/src/scraper/fms/get_all_svg.py

Repository: ScottyLabs/cmumaps

Length of output: 290

🏁 Script executed:

# Check if there are any README or comments in related files fd -t f "(README|README.md)" apps/dataflow/src/scraper/fms/ 2>/dev/null || echo "No README found"

Repository: ScottyLabs/cmumaps

Length of output: 44

Enable SSL verification by default (opt-out).

SSL verification is currently disabled by default for requests to the CMU FMS Systems endpoint. Change the default to enable verification and provide an explicit opt-out flag instead:

Change verify_ssl: bool = False to verify_ssl: bool = True in the fetch_svg() function signature

Replace --verify-ssl flag with --insecure flag (using action="store_true")

Update the fetch_svg() call in main() to pass verify_ssl=not args.insecure

This applies to lines 105, 145, and 227.

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 96-96: Too many arguments in function definition (9 > 5)

(PLR0913)

[warning] 96-96: Missing docstring in public function

(D103)

[error] 105-105: Boolean-typed positional argument in function definition

(FBT001)

[error] 105-105: Boolean default positional argument in function definition

(FBT002)

[warning] 110-110: print found

Remove print

(T201)

[warning] 123-123: Magic value used in comparison, consider replacing 200 with a constant variable

(PLR2004)

[warning] 127-127: Unnecessary else after return statement

Remove unnecessary else

(RET505)

[warning] 128-128: print found

Remove print

(T201)

[warning] 129-129: Do not catch blind exception: Exception

(BLE001)

[warning] 131-131: print found

Remove print

(T201)

[warning] 134-134: print found

Remove print

(T201)

[warning] 137-137: Missing docstring in public function

(D103)

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/get_all_svg.py` around lines 96 - 145, Change SSL verification to be enabled by default: in fetch_svg() set the parameter verify_ssl: bool = True (instead of False). In main(), replace the CLI flag --verify-ssl with an opt-out --insecure flag (action="store_true") and update the call that invokes fetch_svg(...) to pass verify_ssl=not args.insecure so verification is enabled unless --insecure is provided; ensure references to verify_ssl in requests.get(..., verify=verify_ssl) remain consistent.

coderabbitai · 2026-01-31T23:09:15Z

apps/dataflow/src/scraper/fms/get_all_svg.py

+            resp = requests.get(
+                ENDPOINT,
+                params=params,
+                cookies=cookies,
+                headers=headers,
+                timeout=timeout,
+                verify=verify_ssl,
+            )
+            if resp.status_code == 200 and resp.text.strip():
+                out_file.parent.mkdir(parents=True, exist_ok=True)
+                out_file.write_text(resp.text, encoding="utf-8")
+                return True


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd apps/dataflow/src/scraper/fms && wc -l get_all_svg.py

Repository: ScottyLabs/cmumaps

Length of output: 80

🏁 Script executed:

cd apps/dataflow/src/scraper/fms && head -150 get_all_svg.py | tail -50

Repository: ScottyLabs/cmumaps

Length of output: 2228

🏁 Script executed:

cd apps/dataflow/src/scraper/fms && cat -n get_all_svg.py | head -140

Repository: ScottyLabs/cmumaps

Length of output: 9048

🏁 Script executed:

find apps/dataflow -type f -name "*test*.py" | head -20

Repository: ScottyLabs/cmumaps

Length of output: 44

🏁 Script executed:

grep -r "get_all_svg\|fetch_svg" apps/dataflow --include="*.py" | head -20

Repository: ScottyLabs/cmumaps

Length of output: 199

🏁 Script executed:

grep -r "write_text\|\.svg" apps/dataflow --include="*.py" -B 2 -A 2 | head -40

Repository: ScottyLabs/cmumaps

Length of output: 2750

🏁 Script executed:

grep -r "Content-Type\|<svg\|svg\+" apps/dataflow --include="*.py" | head -20

Repository: ScottyLabs/cmumaps

Length of output: 1991

🏁 Script executed:

find apps/dataflow -type f -name "*.py" -exec grep -l "parse\|xml\|svg\|lxml\|ElementTree" {} \;

Repository: ScottyLabs/cmumaps

Length of output: 334

🏁 Script executed:

cat -n apps/dataflow/src/scraper/fms/get_all_svg.py | sed -n '155,241p'

Repository: ScottyLabs/cmumaps

Length of output: 4089

🏁 Script executed:

cat -n apps/dataflow/src/scraper/fms/get_all_svg.py | sed -n '36,42p'

Repository: ScottyLabs/cmumaps

Length of output: 308

🏁 Script executed:

grep -r "getDefaultLayersData\|fmsystems.cmu.edu" apps/dataflow --include="*.py" -B 2 -A 2

Repository: ScottyLabs/cmumaps

Length of output: 4375

🏁 Script executed:

cat -n apps/dataflow/src/scraper/fms/get_all_svg.py | sed -n '1,42p'

Repository: ScottyLabs/cmumaps

Length of output: 1340

🏁 Script executed:

grep -r "svg\|SVG\|Content-Type\|image" apps/dataflow/src/scraper/fms/ --include="*.py" -i | head -30

Repository: ScottyLabs/cmumaps

Length of output: 3055

🏁 Script executed:

cat -n apps/dataflow/src/scraper/fms/get_all_svg.py | sed -n '96,135p'

Repository: ScottyLabs/cmumaps

Length of output: 1690

Add SVG validation before writing response to disk.

The endpoint could return a 200 status with an HTML/login payload if the session expires (cookies are manually extracted and can become stale). Currently, this would be written as a .svg file, causing downstream parsing failures. Check the Content-Type header or validate that the response starts with <svg.

Suggested validation approach

- if resp.status_code == 200 and resp.text.strip(): - out_file.parent.mkdir(parents=True, exist_ok=True) - out_file.write_text(resp.text, encoding="utf-8") - return True + if resp.status_code == 200 and resp.text.strip(): + ct = resp.headers.get("Content-Type", "") + if "image/svg+xml" not in ct and not resp.text.lstrip().startswith("<svg"): + print(f"[warn] Non‑SVG response for floorId={floor_id}; attempt {attempt}/{retries}") + else: + out_file.parent.mkdir(parents=True, exist_ok=True) + out_file.write_text(resp.text, encoding="utf-8") + return True

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 123-123: Magic value used in comparison, consider replacing 200 with a constant variable

(PLR2004)

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/get_all_svg.py` around lines 115 - 126, The response is written to out_file as an .svg without validating that it is actually SVG; update the logic in the function using resp (the requests.get block) to verify the payload before calling out_file.write_text: check resp.headers.get("Content-Type") for an SVG MIME type (e.g., contains "image/svg+xml") or ensure resp.text.lstrip().startswith("<svg") and only then create the parent dir and write the file, otherwise treat it as an error/expired-session case (return False or raise) so stale HTML/login pages are not saved as .svg.

coderabbitai · 2026-01-31T23:09:15Z

apps/dataflow/src/scraper/fms/s3_utils.py

+from minio import Minio
+from dotenv import load_dotenv
+import os
+import json
+


⚠️ Potential issue | 🟠 Major

Create parent directories before writing local JSON.
If s3_object_name includes a path (e.g., foo/bar.json), the write will fail unless the parent directory exists.

📁 Ensure the local directory exists

from dotenv import load_dotenv import os import json +from pathlib import Path @@ - local_path = local_file_path if local_file_path else s3_object_name + local_path = local_file_path if local_file_path else s3_object_name + Path(local_path).parent.mkdir(parents=True, exist_ok=True)

Also applies to: 112-118

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 1-1: Missing docstring in public module

(D100)

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/s3_utils.py` around lines 1 - 5, The code that writes a downloaded S3 object to local JSON must ensure the parent directory exists before calling open/write; update the function that saves the S3 object (the routine that takes s3_object_name and writes to local_path / local_filename—e.g., the save/download function around lines 112-118) to call os.path.dirname(local_path) and, if non-empty, create it with os.makedirs(..., exist_ok=True) prior to opening the file; keep the existing write logic and handle the case where s3_object_name has no path component by skipping directory creation.

coderabbitai · 2026-01-31T23:09:15Z

apps/dataflow/src/scraper/fms/s3_utils.py

+access_key = os.getenv("S3_ACCESS_KEY")
+secret_key = os.getenv("S3_SECRET_KEY")
+s3_endpoint = os.getenv("S3_ENDPOINT")
+
+# Create client with access and secret key.
+client = Minio(
+    s3_endpoint,
+    access_key=access_key,
+    secret_key=secret_key,
+)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "s3_utils.py" | head -5

Repository: ScottyLabs/cmumaps

Length of output: 105

🏁 Script executed:

cat -n apps/dataflow/src/scraper/fms/s3_utils.py | head -50

Repository: ScottyLabs/cmumaps

Length of output: 1692

🏁 Script executed:

# Check if there are other places where Minio client is instantiated rg "Minio\(" --type py -B 2 -A 5

Repository: ScottyLabs/cmumaps

Length of output: 1041

🌐 Web query:

MinIO Python client None values initialization requirements

💡 Result:

For the official MinIO Python SDK (minio / minio-py), the constructor explicitly allows several arguments to be None:

Minio(endpoint, access_key=None, secret_key=None, session_token=None, secure=True, region=None, http_client=None, credentials=None)

[1]

What can be None (and when)

endpoint: cannot be None (it’s the only truly required argument) [1]

access_key, secret_key, session_token: may be None (they’re optional in the API) [1]

If you want authenticated access, docs/examples treat access_key + secret_key as required inputs [2]

If you omit them (leave as None), you’re effectively doing anonymous / unsigned requests (only works where the server/bucket/policy allows it)

region, http_client, credentials: may be None [1]

If you use a credentials= provider, you typically don’t pass access_key/secret_key (leave them None) [1]

What should not be None

secure: should be a boolean (True/False), not None [1]

Sources: MinIO Python Client API constructor signature [1] and the official quickstart/README stating endpoint, access_key, and secret_key are required to connect in the standard authenticated setup [2].

[1] (min.io)
[2] (github.com)

Citations:

1: https://min.io/docs/minio/linux/developers/python/API.html?utm_source=openai

2: https://github.com/minio/minio-py

🏁 Script executed:

cat -n apps/dataflow/src/clients/_s3_client.py | head -80

Repository: ScottyLabs/cmumaps

Length of output: 2818

🏁 Script executed:

# Get more context around the validation in _s3_client.py sed -n '1,50p' apps/dataflow/src/clients/_s3_client.py

Repository: ScottyLabs/cmumaps

Length of output: 1397

Validate S3_ENDPOINT before initializing the Minio client.

The client initialization will fail with an opaque error if S3_ENDPOINT is None. The _s3_client.py module already demonstrates this pattern with explicit endpoint validation. While MinIO allows access_key and secret_key to be None (for anonymous access), the endpoint parameter is required.

🧯 Add endpoint validation

access_key = os.getenv("S3_ACCESS_KEY") secret_key = os.getenv("S3_SECRET_KEY") s3_endpoint = os.getenv("S3_ENDPOINT") + +if not s3_endpoint: + raise ValueError("S3_ENDPOINT environment variable must be set")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

access_key = os.getenv("S3_ACCESS_KEY")

secret_key = os.getenv("S3_SECRET_KEY")

s3_endpoint = os.getenv("S3_ENDPOINT")

# Create client with access and secret key.

client = Minio(

s3_endpoint,

access_key=access_key,

secret_key=secret_key,

)

access_key = os.getenv("S3_ACCESS_KEY")

secret_key = os.getenv("S3_SECRET_KEY")

s3_endpoint = os.getenv("S3_ENDPOINT")

if not s3_endpoint:

raise ValueError("S3_ENDPOINT environment variable must be set")

# Create client with access and secret key.

client = Minio(

s3_endpoint,

access_key=access_key,

secret_key=secret_key,

)

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/s3_utils.py` around lines 8 - 17, The code currently constructs a Minio client without checking s3_endpoint; before calling Minio(...) validate that s3_endpoint (the S3_ENDPOINT env var) is non-empty and raise or log a clear error (e.g., ValueError with a message like "S3_ENDPOINT must be set") so initialization of client = Minio(...) only occurs when s3_endpoint is present; reference the variables access_key, secret_key, s3_endpoint and the Minio client creation in your change.

apps/dataflow/src/scraper/fms/s3_utils.py

coderabbitai · 2026-01-31T23:09:15Z

apps/dataflow/src/scraper/fms/upload_svg_to_s3.py

+    # Iterate through building folders
+    for building in os.listdir(LOCAL_SVG_FOLDER):
+        building_path = os.path.join(LOCAL_SVG_FOLDER, building)
+
+        # Skip if not a directory
+        if not os.path.isdir(building_path):
+            continue


⚠️ Potential issue | 🟡 Minor

Fail fast when the local SVG folder is missing.
os.listdir(LOCAL_SVG_FOLDER) will raise if the folder isn’t present; a clear error helps.

🧯 Add an explicit existence check

def upload_all_svg_files(): """Upload all SVG files from floorplan_svg folder to S3, preserving folder structure""" success_count = 0 fail_count = 0 + + if not os.path.isdir(LOCAL_SVG_FOLDER): + raise FileNotFoundError(f"Local SVG folder not found: {LOCAL_SVG_FOLDER}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Iterate through building folders

for building in os.listdir(LOCAL_SVG_FOLDER):

building_path = os.path.join(LOCAL_SVG_FOLDER, building)

# Skip if not a directory

if not os.path.isdir(building_path):

continue

if not os.path.isdir(LOCAL_SVG_FOLDER):

raise FileNotFoundError(f"Local SVG folder not found: {LOCAL_SVG_FOLDER}")

# Iterate through building folders

for building in os.listdir(LOCAL_SVG_FOLDER):

building_path = os.path.join(LOCAL_SVG_FOLDER, building)

# Skip if not a directory

if not os.path.isdir(building_path):

continue

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 18-18: Use pathlib.Path.iterdir() instead.

(PTH208)

[warning] 19-19: os.path.join() should be replaced by Path with / operator

(PTH118)

[warning] 22-22: os.path.isdir() should be replaced by Path.is_dir()

(PTH112)

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/upload_svg_to_s3.py` around lines 17 - 23, The code calls os.listdir(LOCAL_SVG_FOLDER) which will raise if LOCAL_SVG_FOLDER doesn't exist; before the loop check that LOCAL_SVG_FOLDER exists and is a directory (using os.path.exists or os.path.isdir) and if not, fail fast by raising a clear exception or logging an error and exiting; update the block surrounding the os.listdir call (the loop that iterates over building in os.listdir(LOCAL_SVG_FOLDER)) to perform this pre-check and surface a descriptive message including LOCAL_SVG_FOLDER so downstream callers know why processing stopped.

apps/dataflow/src/scraper/fms/s3_utils.py

apps/dataflow/src/scraper/fms/upload_svg_to_s3.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@apps/dataflow/src/scraper/fms/driver.py`:
- Around line 58-60: Remove the unused "# noqa: E501" comment fragments from the
long XPath string literals in driver.py (the expressions that include
f"//a[contains(normalize-space(.), '{building_label}')]" and the similar
multi-line XPath later around lines 77-79); simply delete the " # noqa: E501"
text so the string concatenation remains identical but without the unused noqa
comments that trigger Ruff CI failures.

In `@apps/dataflow/src/scraper/fms/get_all_svg.py`:
- Around line 1-20: Several lines in get_all_svg.py exceed the E501 length
limit; wrap long string literals and f-strings (especially in the cookies and
headers dicts and any request/logging calls) using implicit string literal
concatenation or parentheses and split long f-strings across concatenated pieces
so each line stays under the limit. Locate the cookies and headers dicts and any
long f-strings used in request URLs or log messages (e.g., where you build
request headers, URLs, or logging statements around functions that fetch SVGs)
and break them into multiple shorter literals or use parentheses to format the
string across lines while preserving the same runtime value. Ensure indentation
and f-expression braces remain correct when splitting and run flake8 locally to
confirm E501 is resolved.

coderabbitai · 2026-02-01T05:30:04Z

apps/dataflow/src/scraper/fms/driver.py

+            f"//a[contains(normalize-space(.), '{building_label}')]"  # noqa: E501
+            "/ancestor::div[1]/following-sibling::ul"
+            "//div[contains(@class,'fmi-nav-floorplan')]/a[contains(@class,'rtIn')]",


⚠️ Potential issue | 🟡 Minor

Remove unused # noqa: E501 to fix CI failures.

Ruff reports these noqa are unused.

🧹 Cleanup

- f"//a[contains(normalize-space(.), '{building_label}')]" # noqa: E501 + f"//a[contains(normalize-space(.), '{building_label}')]" @@ - f"//a[contains(normalize-space(.), '{building_label}')]" # noqa: E501 + f"//a[contains(normalize-space(.), '{building_label}')]"

Also applies to: 77-79

🧰 Tools

🪛 GitHub Actions: Check

[error] 58-58: RUF100: Unused noqa directive (unused: E501)

🪛 Ruff (0.14.14)

[warning] 58-58: Unused noqa directive (unused: E501)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/driver.py` around lines 58 - 60, Remove the unused "# noqa: E501" comment fragments from the long XPath string literals in driver.py (the expressions that include f"//a[contains(normalize-space(.), '{building_label}')]" and the similar multi-line XPath later around lines 77-79); simply delete the " # noqa: E501" text so the string concatenation remains identical but without the unused noqa comments that trigger Ruff CI failures.

coderabbitai · 2026-02-01T05:30:04Z

apps/dataflow/src/scraper/fms/get_all_svg.py

+"""Pipeline: Fetch CMU floor-plan SVGs for a set of buildings.
+
+1. Run driver.py to activate the session for each floor and get cookies/headers.
+
+2. Put a buildings.json file in the current directory with the following format:
+Input JSON format (buildings.json):
+[
+  {
+    "building": "CFA",
+    "floorid": {
+      "1": "...",
+      "2": "...",
+      ...
+    }
+  }
+]
+
+3. Update the `cookies` and `headers` dicts in this script with the values copied from your browser session.
+
+4. Run this script to fetch all SVGs


⚠️ Potential issue | 🟡 Minor

Wrap long lines to fix E501 CI failures.

CI flags these lines; please wrap strings/f‑strings to keep within the line length limit.

🧰 Example fix (apply similarly to the other E501 lines)

-3. Update the `cookies` and `headers` dicts in this script with the values copied from your browser session. +3. Update the `cookies` and `headers` dicts in this script with the values + copied from your browser session. @@ - print( # noqa: T201 - f"[warn] Exception on attempt {attempt}/{retries} for floorId={floor_id}: {e}", - ) + print( # noqa: T201 + f"[warn] Exception on attempt {attempt}/{retries} " + f"for floorId={floor_id}: {e}", + ) @@ - print( # noqa: T201 - f"[fetch] {building_name} | floor {floor_num} -> {out_file.name} (id={floor_id})", - ) + print( # noqa: T201 + f"[fetch] {building_name} | floor {floor_num} -> " + f"{out_file.name} (id={floor_id})", + ) @@ - print( # noqa: T201 - f"[fail] Could not fetch floor {floor_num} (id={floor_id}) for {building_name}", - ) + print( # noqa: T201 + f"[fail] Could not fetch floor {floor_num} (id={floor_id}) " + f"for {building_name}", + )

Also applies to: 106-114, 240-244, 262-263

🧰 Tools

🪛 GitHub Actions: Check

[error] 18-18: E501 Line too long (96 > 88)

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/get_all_svg.py` around lines 1 - 20, Several lines in get_all_svg.py exceed the E501 length limit; wrap long string literals and f-strings (especially in the cookies and headers dicts and any request/logging calls) using implicit string literal concatenation or parentheses and split long f-strings across concatenated pieces so each line stays under the limit. Locate the cookies and headers dicts and any long f-strings used in request URLs or log messages (e.g., where you build request headers, URLs, or logging statements around functions that fetch SVGs) and break them into multiple shorter literals or use parentheses to format the string across lines while preserving the same runtime value. Ensure indentation and f-expression braces remain correct when splitting and run flake8 locally to confirm E501 is resolved.

sentry · 2026-02-01T05:46:35Z

apps/dataflow/src/scraper/fms/driver.py

+from selenium import webdriver
+from selenium.common.exceptions import TimeoutException
+from selenium.webdriver.common.by import By
+from selenium.webdriver.support import expected_conditions as ec
+from selenium.webdriver.support.ui import WebDriverWait


Bug: The script driver.py imports selenium but it is not listed as a project dependency, which will cause an ImportError when the script is run.
_{Severity: MEDIUM}

Suggested Fix

Add selenium to the list of dependencies in the pyproject.toml file to ensure it is installed along with other project requirements. This will allow the driver.py script to be executed successfully.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: apps/dataflow/src/scraper/fms/driver.py#L5-L9 Potential issue: The script `driver.py` imports the `selenium` package at lines 5-9, but `selenium` is not declared as a dependency in the `pyproject.toml` file. The script is documented as a standalone utility to be run manually by users. When a user follows the instructions and attempts to execute the script, it will immediately fail with an `ImportError` because the required package is not installed. This prevents the script from being used as intended.

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@apps/dataflow/src/scraper/fms/download_svg_from_s3.py`:
- Line 5: The import of s3_utils is failing because it’s treated as an absolute
module; update the import in download_svg_from_s3.py to use a relative import
from the same directory (import the symbols bucket_name and client from the
local s3_utils module) — i.e., change the current absolute import of s3_utils to
a relative import for bucket_name and client used in this file (or alternatively
make this directory a proper package with __init__.py or modify sys.path when
running standalone so s3_utils can be resolved).

In `@apps/dataflow/src/scraper/fms/driver.py`:
- Around line 112-115: The function js_click currently lacks a parameter type
which triggers no-untyped-def; add a type annotation for the el parameter (e.g.
use typing.Any or selenium.webdriver.remote.webelement.WebElement) and import
the chosen symbol (Any from typing or WebElement from
selenium.webdriver.remote.webelement) at the top of the module; leave the
function body as-is using driver.execute_script and ensure the signature becomes
def js_click(el: Any) -> None: or def js_click(el: WebElement) -> None: so CI
stops flagging the untyped definition.
- Around line 5-9: The project is missing the selenium dependency required by
the imports (selenium, webdriver, TimeoutException, WebDriverWait,
expected_conditions/ec) used in apps/dataflow/src/scraper/fms/driver.py; add a
compatible selenium entry to the project's dependency list in pyproject.toml
(e.g., under [tool.poetry.dependencies] or the project's dependency section) and
pin a suitable version so the scrape-fms command can import selenium
successfully.

In `@apps/dataflow/src/scraper/fms/s3_utils.py`:
- Around line 217-227: The docstring for get_generic_file_from_s3 incorrectly
mentions a non-existent return_data parameter; update the docstring for the
get_generic_file_from_s3 function to only list the actual parameter
s3_object_name and remove any references to return_data, and correct the Returns
section to accurately describe what the function returns (e.g., the raw S3
response or parsed object as Any | None) so the documentation matches the
function signature and behavior.

In `@apps/dataflow/src/scraper/fms/upload_svg_to_s3.py`:
- Line 5: The import in upload_svg_to_s3.py is causing CI failures; change the
bare module import to the same relative import style used in
download_svg_from_s3.py so the package resolver can find s3_utils. Replace "from
s3_utils import upload_generic_file" with the appropriate relative import (e.g.,
"from .s3_utils import upload_generic_file" or "from ..s3_utils import
upload_generic_file" to match the package layout) so the upload_generic_file
symbol is resolved at runtime.

🧹 Nitpick comments (2)

apps/dataflow/src/scraper/fms/driver.py (1)

17-23: Consider moving driver instantiation inside a main block.

The WebDriver is instantiated at module import time, which means importing this module will launch Chrome. This could cause issues if the module is imported for testing or by other scripts. However, this is acceptable for a standalone utility script.

apps/dataflow/src/scraper/fms/s3_utils.py (1)

56-61: Note: upload_folder only uploads top-level files, not subdirectories.

The function skips directories (line 60-61), so nested folder structures won't be uploaded. If recursive upload is needed, consider using os.walk() or documenting this limitation.

coderabbitai · 2026-02-01T05:48:11Z

apps/dataflow/src/scraper/fms/download_svg_from_s3.py

+
+from pathlib import Path
+
+from s3_utils import bucket_name, client


⚠️ Potential issue | 🟠 Major

Fix the import to resolve CI failure.

The pipeline reports import-not-found for s3_utils. Since s3_utils.py is in the same directory, use a relative import or ensure the package structure supports absolute imports.

🔧 Proposed fix using relative import

-from s3_utils import bucket_name, client +from .s3_utils import bucket_name, client

Alternatively, if this script is intended to run standalone, add the directory to sys.path or restructure as a proper package with __init__.py.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from s3_utils import bucket_name, client

from .s3_utils import bucket_name, client

🧰 Tools

🪛 GitHub Actions: Check

[error] 5-5: Cannot find implementation or library stub for module named "s3_utils" [import-not-found]

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/download_svg_from_s3.py` at line 5, The import of s3_utils is failing because it’s treated as an absolute module; update the import in download_svg_from_s3.py to use a relative import from the same directory (import the symbols bucket_name and client from the local s3_utils module) — i.e., change the current absolute import of s3_utils to a relative import for bucket_name and client used in this file (or alternatively make this directory a proper package with __init__.py or modify sys.path when running standalone so s3_utils can be resolved).

coderabbitai · 2026-02-01T05:48:11Z

apps/dataflow/src/scraper/fms/driver.py

+from selenium import webdriver
+from selenium.common.exceptions import TimeoutException
+from selenium.webdriver.common.by import By
+from selenium.webdriver.support import expected_conditions as ec
+from selenium.webdriver.support.ui import WebDriverWait


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if selenium is listed in any dependency file rg -l "selenium" --glob "pyproject.toml" --glob "requirements*.txt" --glob "setup.py"

Repository: ScottyLabs/cmumaps

Length of output: 44

🏁 Script executed:

#!/bin/bash # Also check for common dependency file patterns fd -e "toml|txt|py" -x grep -l "selenium" {} \; 2>/dev/null | head -20

Repository: ScottyLabs/cmumaps

Length of output: 44

🏁 Script executed:

#!/bin/bash # Check the driver.py file to see the imports and other flagged issues head -120 apps/dataflow/src/scraper/fms/driver.py | cat -n

Repository: ScottyLabs/cmumaps

Length of output: 5455

🏁 Script executed:

#!/bin/bash # Search more comprehensively for dependency files and selenium mentions find . -type f \( -name "setup.cfg" -o -name "tox.ini" -o -name "environment.yml" -o -name "Pipfile" -o -name "poetry.lock" -o -name "requirements-*.txt" \) | head -20

Repository: ScottyLabs/cmumaps

Length of output: 44

🏁 Script executed:

#!/bin/bash # Check pyproject.toml content if it exists fd "pyproject.toml" -x cat {} \;

Repository: ScottyLabs/cmumaps

Length of output: 1613

🏁 Script executed:

#!/bin/bash # Search for any CI configuration files that might indicate the import-not-found error fd -e "yml" -e "yaml" . -path "*/.github/workflows/*" | head -5

Repository: ScottyLabs/cmumaps

Length of output: 230

🏁 Script executed:

#!/bin/bash # Check if there's a requirements file or similar in the dataflow directory find apps/dataflow -type f \( -name "requirements*" -o -name "pyproject.toml" -o -name "setup.py" \) 2>/dev/null

Repository: ScottyLabs/cmumaps

Length of output: 90

🏁 Script executed:

#!/bin/bash # Look for GitHub Actions workflows find .github/workflows -type f -name "*.yml" -o -name "*.yaml" 2>/dev/null | head -10

Repository: ScottyLabs/cmumaps

Length of output: 89

🏁 Script executed:

#!/bin/bash # Search for any reference to selenium in the codebase to understand usage rg "selenium" --type python

Repository: ScottyLabs/cmumaps

Length of output: 92

🏁 Script executed:

#!/bin/bash # Check if there are any other missing imports or issues by looking at the full driver.py wc -l apps/dataflow/src/scraper/fms/driver.py

Repository: ScottyLabs/cmumaps

Length of output: 105

🏁 Script executed:

#!/bin/bash # Check the CI workflow cat .github/workflows/check.yml

Repository: ScottyLabs/cmumaps

Length of output: 2094

🏁 Script executed:

#!/bin/bash # Search for selenium without specifying type rg "selenium" apps/dataflow/

Repository: ScottyLabs/cmumaps

Length of output: 513

🏁 Script executed:

#!/bin/bash # Read the rest of driver.py tail -110 apps/dataflow/src/scraper/fms/driver.py | cat -n

Repository: ScottyLabs/cmumaps

Length of output: 4375

Add selenium to project dependencies.

The selenium package is imported in apps/dataflow/src/scraper/fms/driver.py (lines 5-9) but is not declared in apps/dataflow/pyproject.toml. This causes import failures when running the scrape-fms command. Add selenium to the project dependencies in pyproject.toml.

🧰 Tools

🪛 GitHub Actions: Check

[error] 5-5: Cannot find implementation or library stub for module named "selenium" [import-not-found]

[error] 5-5: Cannot find implementation or library stub for module named "selenium.common.exceptions" [import-not-found]

[error] 5-5: Cannot find implementation or library stub for module named "selenium.webdriver.common.by" [import-not-found]

[error] 5-5: Cannot find implementation or library stub for module named "selenium.webdriver.support" [import-not-found]

[error] 5-5: Cannot find implementation or library stub for module named "selenium.webdriver.support.ui" [import-not-found]

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/driver.py` around lines 5 - 9, The project is missing the selenium dependency required by the imports (selenium, webdriver, TimeoutException, WebDriverWait, expected_conditions/ec) used in apps/dataflow/src/scraper/fms/driver.py; add a compatible selenium entry to the project's dependency list in pyproject.toml (e.g., under [tool.poetry.dependencies] or the project's dependency section) and pin a suitable version so the scrape-fms command can import selenium successfully.

coderabbitai · 2026-02-01T05:48:11Z

apps/dataflow/src/scraper/fms/driver.py

+def js_click(el) -> None:  # noqa: ANN001
+    """Click an element using JavaScript."""
+    driver.execute_script("arguments[0].scrollIntoView({block:'center'});", el)
+    driver.execute_script("arguments[0].click();", el)


⚠️ Potential issue | 🟡 Minor

Add type annotation for the el parameter to fix CI failure.

The pipeline reports no-untyped-def for this function. Add a type hint for the el parameter.

🔧 Proposed fix

-def js_click(el) -> None: # noqa: ANN001 +def js_click(el: Any) -> None: """Click an element using JavaScript.""" driver.execute_script("arguments[0].scrollIntoView({block:'center'});", el) driver.execute_script("arguments[0].click();", el)

You'll need to import Any from typing, or use WebElement from selenium.webdriver.remote.webelement for a more precise type.

🧰 Tools

🪛 GitHub Actions: Check

[error] 112-112: Function is missing a type annotation for one or more arguments [no-untyped-def]

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/driver.py` around lines 112 - 115, The function js_click currently lacks a parameter type which triggers no-untyped-def; add a type annotation for the el parameter (e.g. use typing.Any or selenium.webdriver.remote.webelement.WebElement) and import the chosen symbol (Any from typing or WebElement from selenium.webdriver.remote.webelement) at the top of the module; leave the function body as-is using driver.execute_script and ensure the signature becomes def js_click(el: Any) -> None: or def js_click(el: WebElement) -> None: so CI stops flagging the untyped definition.

coderabbitai · 2026-02-01T05:48:11Z

apps/dataflow/src/scraper/fms/s3_utils.py

+def get_generic_file_from_s3(s3_object_name: str) -> Any | None:  # noqa: ANN401
+    """Get generic data from S3 bucket.
+
+    Args:
+        s3_object_name (str): The S3 object name/path
+        return_data (bool): If True, return the data as Python object
+                            If False, return the raw response object
+
+    Returns:
+        response object
+


⚠️ Potential issue | 🟡 Minor

Fix docstring: remove reference to non-existent return_data parameter.

The docstring documents a return_data parameter that doesn't exist in this function's signature. This appears to be copy-paste from get_json_from_s3.

📝 Proposed fix

def get_generic_file_from_s3(s3_object_name: str) -> Any | None: # noqa: ANN401 """Get generic data from S3 bucket. Args: s3_object_name (str): The S3 object name/path - return_data (bool): If True, return the data as Python object - If False, return the raw response object Returns: - response object + Response object, or None if retrieval fails. """

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/s3_utils.py` around lines 217 - 227, The docstring for get_generic_file_from_s3 incorrectly mentions a non-existent return_data parameter; update the docstring for the get_generic_file_from_s3 function to only list the actual parameter s3_object_name and remove any references to return_data, and correct the Returns section to accurately describe what the function returns (e.g., the raw S3 response or parsed object as Any | None) so the documentation matches the function signature and behavior.

coderabbitai · 2026-02-01T05:48:11Z

apps/dataflow/src/scraper/fms/upload_svg_to_s3.py

+
+from pathlib import Path
+
+from s3_utils import upload_generic_file


⚠️ Potential issue | 🟠 Major

Fix the import to resolve CI failure.

Same issue as download_svg_from_s3.py - use a relative import or ensure proper package structure.

🔧 Proposed fix

-from s3_utils import upload_generic_file +from .s3_utils import upload_generic_file

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from s3_utils import upload_generic_file

from .s3_utils import upload_generic_file

🧰 Tools

🪛 GitHub Actions: Check

[error] 5-5: Cannot find implementation or library stub for module named "s3_utils" [import-not-found]

🤖 Prompt for AI Agents

In `@apps/dataflow/src/scraper/fms/upload_svg_to_s3.py` at line 5, The import in upload_svg_to_s3.py is causing CI failures; change the bare module import to the same relative import style used in download_svg_from_s3.py so the package resolver can find s3_utils. Replace "from s3_utils import upload_generic_file" with the appropriate relative import (e.g., "from .s3_utils import upload_generic_file" or "from ..s3_utils import upload_generic_file" to match the package layout) so the upload_generic_file symbol is resolved at runtime.

tu2463 requested review from ellylai and romycyy as code owners January 31, 2026 22:56

tu2463 force-pushed the fms-scraper branch from 48f6f19 to 7360f8a Compare January 31, 2026 22:59

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

tu2463 force-pushed the fms-scraper branch from 7360f8a to 7283515 Compare February 1, 2026 05:24

sentry bot reviewed Feb 1, 2026

View reviewed changes

apps/dataflow/src/scraper/fms/s3_utils.py Show resolved Hide resolved

apps/dataflow/src/scraper/fms/upload_svg_to_s3.py Show resolved Hide resolved

coderabbitai bot reviewed Feb 1, 2026

View reviewed changes

feat: web driver&util to scrape and upload fms svg

2e28e6e

tu2463 force-pushed the fms-scraper branch from 7283515 to 2e28e6e Compare February 1, 2026 05:42

sentry bot reviewed Feb 1, 2026

View reviewed changes

coderabbitai bot reviewed Feb 1, 2026

View reviewed changes


		from pathlib import Path

		from s3_utils import bucket_name, client

	from s3_utils import bucket_name, client
	from .s3_utils import bucket_name, client


		from pathlib import Path

		from s3_utils import upload_generic_file

	from s3_utils import upload_generic_file
	from .s3_utils import upload_generic_file

feat: web driver & util to scrape and upload fms floorplan #263

Are you sure you want to change the base?

feat: web driver & util to scrape and upload fms floorplan #263

Uh oh!

Conversation

tu2463 commented Jan 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Summary by CodeRabbit

Uh oh!

vercel bot commented Jan 31, 2026

Uh oh!

coderabbitai bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

What can be None (and when)

What should not be None

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

tu2463 commented Jan 31, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 31, 2026 •

edited

Loading

What can be `None` (and when)

What should not be `None`