-
Notifications
You must be signed in to change notification settings - Fork 113
feat: introduce batch execution mode #4158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I will mark this PR ready after adding test cases. |
| new CostBasedScheduleGenerator( | ||
| workflowContext, | ||
| physicalPlan, | ||
| physicalPlan.copy(executionMode = workflowContext.workflowSettings.executionMode), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this setting should be part of Physical plan. The information is already in workflowContext, why duplicate it again in physical plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, please do not add such configuration to physical plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before doing this PR, Xiaozhen, Chen, and I discussed this and agreed that this flag should be part of the physical plan. Given how the CostBasedScheduleGenerator uses blocking output links in the physical plan to reason about regions, this information belongs in the physical plan. We are not duplicating information in the physical plan; we use this information to generate a new physical plan. In fact, the physical plans for streaming mode and batch mode are different, because the physical plan includes the links, and in batch mode every output link becomes a blocking link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot agree.
I think physical plan should be exactly same, no matter how we execute it. If it got changed then we need to revisit previous decisions. The execution plan will be different, based on how we cut the regions and how we do materializations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think it’s true that the physical plan will be exactly the same regardless of how we execute it. If you look at the line 52, this.physicalPlan = updatedPhysicalPlan, the physical plan clearly changes before and after CostBasedScheduleGenerator, so the physical plan changes based on the execution plan.
We may need input from @Xiao-zhen-Liu to understand why Pasta needs to modify the physical plan and whether this behavior is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pasta does not modify the physical plan. We can still keep the physical plan immutable for this PR by pushing the logic of marking all edges as materialized inside Pasta.
What changes were proposed in this PR?
We are introducing a batch execution mode alongside the existing streaming execution mode.
To support this, a new workflow setting called execution mode has been added, along with a corresponding backend flag. Batch and streaming execution are treated as the same level and are represented as an enum in the configuration.
We also introduce a flag named
defaultExecutionMode, whose value is set tostreaming.Any related issues, documentation, discussions?
This PR resolves issue #4157.
How was this PR tested?
Tested using the existing test cases. TBA.
Was this PR authored or co-authored using generative AI tooling?
No.