Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR pins pandas and pyspark dependencies to specific versions (pandas 2.1.4, pyspark 3.5.8) and refactors the creation of the "day" column into a separate helper function to improve code maintainability and consistency. The changes also include compatibility updates for the older pandas version and Python version downgrade from 3.13 to 3.10-3.11.
Changes:
- Pin pandas to 2.1.4 and pyspark to 3.5.8, downgrade Python requirement to 3.10-3.11, and update Java requirement to version 11
- Refactor day column creation into a new
add_day_column()helper function and introduceCallDataRecordDataWithDayschema - Update pandas groupby operations to use
group_keys=Falseinstead ofinclude_groups=Falsefor compatibility with pandas 2.1.4
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Pin pandas and pyspark versions, downgrade Python requirement to 3.10-3.11, remove some type stubs |
| Makefile | Update Java version requirement from 17 to 11 |
| notebooks/featurizer.ipynb | Update Java path and Python version, add calls to add_day_column() for data preprocessing |
| notebooks/demo_pipeline.ipynb | Update Python version to 3.10.19 |
| src/cider/featurizer/schemas.py | Add CallDataRecordDataWithDay schema and update CallDataRecordTagged to inherit from it |
| src/cider/featurizer/dependencies.py | Add add_day_column() function, update functions to expect data with day column, fix condition initialization to use lit(True), update pandas groupby operations |
| src/cider/featurizer/core.py | Update preprocess_data to call add_day_column() |
| src/cider/validation_metrics/dependencies.py | Update pandas groupby to use group_keys=False for compatibility |
| src/cider/validation_metrics/core.py | Update pandas groupby to use group_keys=False for compatibility |
| tests/test_featurizer.py | Add test for add_day_column(), update tests to call add_day_column() before processing, update error messages and validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Pin pandas and pyspark dependencies; debug the package after these changes
Move the creation of a "day" column to a separate dependency function in featurizer; debug after this change
How has this been tested?
Run tests -- check that they all pass
Run featurizer.ipynb;
Checklist
Fill with
xfor completed.pre-commithooks locally