-
Notifications
You must be signed in to change notification settings - Fork 9
Some unchecked improvements #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| for start, end in self.skip_row_ranges: | ||
| if start <= row_idx <= end: | ||
| return True | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use return any(start <= row_idx <= end for start, end in self.skip_row_ranges) instead of for loop
| for start, end in self.skip_row_ranges: | |
| if start <= row_idx <= end: | |
| return True | |
| return False | |
| return any(start <= row_idx <= end for start, end in self.skip_row_ranges) |
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #213 +/- ##
==========================================
- Coverage 41.27% 41.22% -0.05%
==========================================
Files 286 286
Lines 32928 32967 +39
Branches 4684 4699 +15
==========================================
Hits 13590 13590
- Misses 18281 18320 +39
Partials 1057 1057
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| ) | ||
|
|
||
| # Apply forward fill before filtering rows | ||
| df = self._apply_forward_fill(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Forward fill applies before row filtering causes data leakage
The _apply_forward_fill is applied to all rows before skip_row_ranges filtering occurs. This causes values from rows that should be skipped (e.g., header/explanation rows) to leak into subsequent data rows via forward fill. For example, if row 0 has "Category" text and should be skipped, and row 1 has a null that gets forward-filled, the data row will incorrectly receive the header value "Category". The forward fill operation needs to happen after filtering out skipped rows, not before.
Description
I make the PR to not forget about this. But the code is not ready yet.
Screenshots/Videos (if applicable)
Add screenshots or videos demonstrating the changes if applicable.
Related issue
E.g. Link to asana, sentry, slack thread etc.
Requirements, dependencies and related PRs
Describe or link possible requirements, dependencies and related PRs here
Additional Notes
Any additional information that reviewers should know about this PR.
✅ Pre-Merge Checklist
Type of Change
Testing
Manual testing instructions
If feature requires manual testing by reviewer, you can provide instructions here.Internationalization & Accessibility
Dependencies
Screenshots/Videos (if applicable)
Add screenshots or videos demonstrating the changes if applicable.
Additional Notes
Any additional information that reviewers should know about this PR.
Note
Adds per-column forward fill, replaces skip_rows with skip_row_ranges, and supports YAML templates and column range specs in config to enhance dataset transformation.
ColumnSpec.forward_filland_apply_forward_fill()to fill nulls per marked columns before processing.skip_rowswithskip_row_ranges; addDatasetSchema.should_skip_row()and read all rows before skipping during iteration.load_config):column_spec_templatesand column range keys (e.g.,"4-6"), merging templates with overrides.forward_fillflags intoColumnSpec.skip_rowstoskip_row_rangesfor backward compatibility.Path.openfor reading YAML.get_summary()now includesskip_row_ranges.# noqa: TRY300).Written by Cursor Bugbot for commit ac47d9d. This will update automatically on new commits. Configure here.