Skip to content

Conversation

@Richard-31415
Copy link
Contributor

@Richard-31415 Richard-31415 commented Jan 25, 2026

Summary by CodeRabbit

  • New Features

    • Added JSON schema validation for uploaded files—data is validated and invalid uploads are blocked and logged to ensure consistent, reliable datasets.
  • Chores

    • Added Pydantic dependency to enable validation.
    • Included the new data models in the packaged distribution so validation models are available at runtime.

✏️ Tip: You can customize this high-level summary in your review settings.

@vercel
Copy link

vercel bot commented Jan 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cmumaps Ready Ready Preview Jan 31, 2026 10:28pm
cmumaps-visualizer Ready Ready Preview Jan 31, 2026 10:28pm

Request Review

@coderabbitai
Copy link

coderabbitai bot commented Jan 25, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds a Pydantic models package and integrates pre-upload JSON schema validation into the S3 client; also updates packaging to include the models and bumps runtime dependency on Pydantic.

Changes

Cohort / File(s) Summary
Project Metadata
apps/dataflow/pyproject.toml
Added pydantic>=2.12.5 dependency and included src/models in wheel package targets.
Models Package (new modules)
apps/dataflow/src/models/_common.py, apps/dataflow/src/models/buildings.py, apps/dataflow/src/models/floorplans.py, apps/dataflow/src/models/graph.py, apps/dataflow/src/models/placements.py
Introduced strict Pydantic schemas (GeoCoordinate, LocalPosition, Floor, Building/Buildings, Room/Floorplans, GraphNode/Graph, Placement/Placements) using ConfigDict(extra="forbid") and appropriate field aliases.
Models Package: Public API
apps/dataflow/src/models/__init__.py
Created package initializer that re-exports model classes and defines __all__.
S3 Client Validation & singleton
apps/dataflow/src/clients/_s3_client.py
Added SCHEMA_VALIDATORS registry, _get_validator() and _validate_json_data() helpers; upload now loads local JSON and validates against selected Pydantic model before uploading. Added get_s3_client_singleton() cached factory.
Minor client & logger adjustments
apps/dataflow/src/clients/_api_client.py, apps/dataflow/src/logger/_utils.py
Moved get_api_client_singleton location (no behavior change). Moved Generator import to top-level for runtime availability (no logic change).

Sequence Diagram

sequenceDiagram
    participant User
    participant S3Client
    participant FileSystem as File System
    participant Validator as Pydantic Validator
    participant S3 as AWS S3

    User->>S3Client: upload_json_file(local_path, s3_name)
    S3Client->>S3Client: _get_validator(s3_name)
    S3Client-->>S3Client: returns validator model or None

    alt validator found
        S3Client->>FileSystem: read JSON from local_path
        FileSystem-->>S3Client: json_data
        S3Client->>Validator: model_validate(json_data)
        alt validation passes
            Validator-->>S3Client: validated object
            S3Client->>S3: upload file
            S3-->>S3Client: upload success
        else validation fails
            Validator-->>S3Client: ValidationError
            S3Client-->>User: log error & abort upload
        end
    else no validator
        S3Client->>S3: upload file (no validation)
        S3-->>S3Client: upload success
    end
    S3Client-->>User: return result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐇 With whiskers twitching, I compile each spec,

I guard the JSON path and check each fleck.
Models lined up, neat and prim,
Validation hops, never dim.
Off to S3 the proper files trek.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing Pydantic models to enable type checking for S3 uploads. The PR adds comprehensive data models (Buildings, Floorplans, Graph, Placements, and supporting types) with validation logic integrated into the S3 client.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch s3_typing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@apps/dataflow/src/clients/_s3_client.py`:
- Around line 50-55: The _get_validator function currently uses suffix-matching
(s3_object_name.endswith(pattern)) which can select the wrong validator for
names like "old_buildings.json"; change it to compare the filename basename
exactly against the pattern instead. Update _get_validator to compute the
basename (e.g., via os.path.basename or Path(...).name) and use equality
(basename == pattern) when iterating SCHEMA_VALIDATORS, keeping the same return
of the validator or None; reference the _get_validator function and
SCHEMA_VALIDATORS when making this change.

In `@apps/dataflow/src/models/buildings.py`:
- Line 27: The field declaration entrances: list | None = None is missing a
generic type parameter; update the type to include the element type (for example
entrances: list[str] | None = None or entrances: list[Entrance] | None = None)
to match the actual contents used in the module and satisfy the type
checker—modify the entrances attribute in models/buildings.py accordingly,
replacing plain list with the appropriate list[T] generic.
- Around line 9-10: The import of GeoCoordinate is currently guarded by
TYPE_CHECKING which removes it at runtime and breaks Pydantic nested-model
validation for fields like label_position, shapes, and hitbox; move the
GeoCoordinate import out of the TYPE_CHECKING block so the class is available at
runtime (i.e., import GeoCoordinate normally from models._common) so Pydantic
can validate those nested fields in the Building model.

In `@apps/dataflow/src/models/floorplans.py`:
- Around line 9-10: The imports for Floor and GeoCoordinate are only under the
TYPE_CHECKING guard which prevents Pydantic from seeing the real classes at
runtime and breaks nested model validation for fields like label_position,
floor, and coordinates; move the from models._common import Floor, GeoCoordinate
import statements out of the TYPE_CHECKING block so the actual classes are
available at runtime (or alternatively switch those field annotations to
forward-reference strings and call update_forward_refs where appropriate),
ensuring symbols Floor and GeoCoordinate are resolvable by Pydantic during
validation.

In `@apps/dataflow/src/models/graph.py`:
- Around line 9-10: The TYPE_CHECKING guard prevents Pydantic from seeing nested
model classes at runtime; move the imports of Floor, GeoCoordinate, and
LocalPosition out of the TYPE_CHECKING block so they are imported at runtime
(i.e., import the actual classes at module top-level rather than only under
TYPE_CHECKING) so Pydantic validation for any usages in this module (e.g., Graph
models or fields that reference Floor, GeoCoordinate, LocalPosition) works
correctly.

In `@apps/dataflow/src/models/placements.py`:
- Around line 9-10: The TYPE_CHECKING-only imports for GeoCoordinate and
LocalPosition prevent Pydantic from seeing those nested model classes at
runtime; move the imports for GeoCoordinate and LocalPosition out of the
TYPE_CHECKING block so they are actual runtime imports (i.e., import from
models._common at module level), ensuring any Pydantic model fields in this
module that reference GeoCoordinate and LocalPosition resolve correctly during
validation.
🧹 Nitpick comments (1)
apps/dataflow/src/models/graph.py (1)

19-19: Consider defining a typed model for to_floor_info if structure is known.

Using dict[str, Any] bypasses Pydantic's strict validation. If the structure of floor transition info is known and consistent, consider creating a dedicated model for it to maintain validation consistency with the rest of the schema.

@coderabbitai
Copy link

coderabbitai bot commented Jan 25, 2026

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #254

coderabbitai bot added a commit that referenced this pull request Jan 25, 2026
Docstrings generation was requested by @Richard-31415.

* #253 (comment)

The following files were modified:

* `apps/dataflow/src/clients/_s3_client.py`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants