Some unchecked improvements #213

jtuomist · 2025-11-29T06:21:36Z

Description

I make the PR to not forget about this. But the code is not ready yet.

Screenshots/Videos (if applicable)

Add screenshots or videos demonstrating the changes if applicable.

Related issue

E.g. Link to asana, sentry, slack thread etc.

Requirements, dependencies and related PRs

Describe or link possible requirements, dependencies and related PRs here

Additional Notes

Any additional information that reviewers should know about this PR.

✅ Pre-Merge Checklist

Type of Change

Set the PR's label to match the nature of this change

Testing

Built Unit tests (unit tests added/updated)
Built E2E tests (if applicable. E2E tests added/updated)
Authorization is tested (permissions and access controls verified)
Manually tested locally (functionality verified)
Manual testing instructions
If feature requires manual testing by reviewer, you can provide instructions here.

Internationalization & Accessibility

New strings are translatable (all user-facing text uses i18n)
Accessibility standards met (WCAG compliance, screen reader support)

Dependencies

Dependencies are merged (if applicable. If the change depends on other PRs e.g. kausal_common)

Screenshots/Videos (if applicable)

Add screenshots or videos demonstrating the changes if applicable.

Additional Notes

Any additional information that reviewers should know about this PR.

Note

Adds per-column forward fill, replaces skip_rows with skip_row_ranges, and supports YAML templates and column range specs in config to enhance dataset transformation.

Core transformation:
- Forward fill: Add ColumnSpec.forward_fill and _apply_forward_fill() to fill nulls per marked columns before processing.
- Row skipping: Replace skip_rows with skip_row_ranges; add DatasetSchema.should_skip_row() and read all rows before skipping during iteration.
Config loading (load_config):
- Support column_spec_templates and column range keys (e.g., "4-6"), merging templates with overrides.
- Parse forward_fill flags into ColumnSpec.
- Map legacy skip_rows to skip_row_ranges for backward compatibility.
- Use Path.open for reading YAML.
Output/CLI:
- get_summary() now includes skip_row_ranges.
- Minor CLI return annotation (# noqa: TRY300).

^{Written by Cursor Bugbot for commit ac47d9d. This will update automatically on new commits. Configure here.}

github-actions · 2025-11-29T06:22:30Z

notebooks/convert_complex_data.py

+        for start, end in self.skip_row_ranges:
+            if start <= row_idx <= end:
+                return True
+        return False


⚠️ [ruff] <SIM110> _{reported by reviewdog 🐶}
Use return any(start <= row_idx <= end for start, end in self.skip_row_ranges) instead of for loop

Suggested change

for start, end in self.skip_row_ranges:

if start <= row_idx <= end:

return True

return False

return any(start <= row_idx <= end for start, end in self.skip_row_ranges)

kausal-code-coverage · 2025-11-29T06:23:20Z

Codecov Report

❌ Patch coverage is 0% with 51 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
notebooks/convert_complex_data.py	0.00%	51 Missing ⚠️

@@            Coverage Diff             @@
##             main     #213      +/-   ##
==========================================
- Coverage   41.27%   41.22%   -0.05%     
==========================================
  Files         286      286              
  Lines       32928    32967      +39     
  Branches     4684     4699      +15     
==========================================
  Hits        13590    13590              
- Misses      18281    18320      +39     
  Partials     1057     1057

Flag	Coverage Δ
e2e-tests	`11.85% <0.00%> (-0.02%)`	⬇️
unittests	`34.77% <0.00%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
notebooks/convert_complex_data.py	`0.00% <0.00%> (ø)`

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cursor

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2025-11-29T06:26:12Z

notebooks/convert_complex_data.py

        )

+        # Apply forward fill before filtering rows
+        df = self._apply_forward_fill(df)


Bug: Forward fill applies before row filtering causes data leakage

The _apply_forward_fill is applied to all rows before skip_row_ranges filtering occurs. This causes values from rows that should be skipped (e.g., header/explanation rows) to leak into subsequent data rows via forward fill. For example, if row 0 has "Category" text and should be skipped, and row 1 has a null that gets forward-filled, the data row will incorrectly receive the header value "Category". The forward fill operation needs to happen after filtering out skipped rows, not before.

Additional Locations (1)

notebooks/convert_complex_data.py#L123-L127

Some unchecked improvements

ac47d9d

github-actions bot reviewed Nov 29, 2025

View reviewed changes

cursor bot reviewed Nov 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some unchecked improvements #213

Some unchecked improvements #213

Uh oh!

jtuomist commented Nov 29, 2025 •

edited by cursor bot

Loading

Uh oh!

github-actions bot Nov 29, 2025

Uh oh!

kausal-code-coverage bot commented Nov 29, 2025 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Some unchecked improvements #213

Are you sure you want to change the base?

Some unchecked improvements #213

Uh oh!

Conversation

jtuomist commented Nov 29, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Screenshots/Videos (if applicable)

Related issue

Requirements, dependencies and related PRs

Additional Notes

✅ Pre-Merge Checklist

Type of Change

Testing

Manual testing instructions

Internationalization & Accessibility

Dependencies

Screenshots/Videos (if applicable)

Additional Notes

Uh oh!

github-actions bot Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

kausal-code-coverage bot commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor bot Nov 29, 2025

Choose a reason for hiding this comment

Bug: Forward fill applies before row filtering causes data leakage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jtuomist commented Nov 29, 2025 •

edited by cursor bot

Loading

kausal-code-coverage bot commented Nov 29, 2025 •

edited

Loading