-
Notifications
You must be signed in to change notification settings - Fork 527
fix: support system columns in dataset.take* operations #5722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: support system columns in dataset.take* operations #5722
Conversation
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Xuanwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @hamersaw for working on this! Only have a question.
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
| let mut stripped_projection = projection.as_ref().clone(); | ||
| stripped_projection.requested_output_expr = stripped_projection | ||
| .requested_output_expr | ||
| .clone() | ||
| .into_iter() | ||
| .filter(|e| e.name != ROW_OFFSET) | ||
| .collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty unsure about this (feels hacky). I first attempted to inject the ROW_OFFSET field into the schema within project_batch and inject the AddRowOffsetExec into the ExecutionPlan but was getting "index out of bounds" with len 2. I think this may be because we expect the ROW_ADDR column (required for ROW_OFFSET compute) but we manually inject it for take* operations.
My other idea is to strip this field from the requested_output_expr in the TakeBuilder when we convert into the ProjectionPlan. I'm unsure of the implications there.
Previously, "take*" operations did not support
_rowid,_rowoffset,_row_created_at_version, and_row_last_updated_at_version. In this PR we add support for all of these columns.We preserve these system columns through the initial schema projection so that they can be used to populate the correct flags when building the
ProjectionPlanandPhysicalProjectionstructs._rowid/_rowaddr: persisting these through toProjectionPlanfields was enough to make them work_rowoffset: required additionally (1) strippingROW_OFFSETfield fromProjectionPlanrequested_output_exprand (2) manually injecting column usingAddRowOffsetExec(after exposing some methods publicly)_row_created_at_version/_row_last_updated_at_version: required piping through flags toFragmentreaders.Addresses #5615.