-
Notifications
You must be signed in to change notification settings - Fork 14
Add heuristic for no-producer operations #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
How is the producer being mapped after its consumer? Can you explain this in your description? |
| // Because we have already thoroughly iterated candidates of no-producer | ||
| // operations. If the current operation is after a no-producer operation, we | ||
| // need backtrack 2 depths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? And backtrack 2 depth/layer means a no-producer op is also backtracked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then we re-map the backtracked no-producer op? say,
<no_producer_op0, no_producer_op1, no_producer_op2, current_op, next_op>, backtracking 2 steps means we restart the mapping of the second highest award loc for no_producer_op2 or no_producer_op1?
"If the current operation is after a no-producer operation", does "after" mean "immediate after" or just "after"? Let's say "next_op" fail, we can just fall back "current_op", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the bug here. If current_op fails, now we will backtrack to the mapping state before mapping no_producer_op1; it still is in an infinite loop.
The correct solution is to backtrack until it is not a no_producer_op, because only in this case can we result in an effective backtrack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ShangkunLi, could you please add an example as comments above the function, like:
<no_producer_op0, no_producer_op1, no_producer_op2, op3, op4>
| | | | |
| loc_0 | | loc_3 | ...
| loc_1 | | loc_4 | ...
| loc_2 | | loc_5 | ...
explain what happen for different cases:
- op4 failed
- op3 failed
- no_producer_op2 failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit hesitate to check in this feature... Such looking ahead is too complicated for our code structure, and error-prone... I believe simply assigning initial loc for no-producer-op might be able to achieve similar benefit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the comments in the latest commit
Feel free to take it or not~ Just try to improve the II, I will also try to map the recurrence cycle first and see if it can improve the II.
|
|
||
| // 1. If we fail to map op4, we can backtrack to op3 using this | ||
| // performBacktrack function. | ||
| // 2. If we fail to map op3, we need to backtrack to op0 to avoid getting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we would have op0 before no_producer_op1? Shouldn't ops be ordered in topological ordering?
|
Let's check in the PR of "the critical path mapping first". Thanks~! |
Background
For example, for
%1 = "neura.grant_once"() <{constant_value = 0 : i64}> : () -> !neura.data<i64, i1>, it doesn't have any producers, so the candidates for it are chosen randomly or based solely on the time.However, these randomly chosen candidate locations may result in sub-optimal mapping results. And if we want to modify the assigned locations for these operations, we need to set a large
max_backtrack_depth(it takes ~10 hours and is unrealistic).New heuristic
greedy(max_locations_to_try=user_defined_num, equals to 4*4=16 in now implementation,max_backtrack_depth=0) mapping result.And the mapping results show an improvement in both spatial-only and spatial-temporal mapping.
p.s. I also tried an A* search-like heuristic and took tile utilization and routing congestion into consideration, that is, evaluate the award based on both producers and consumers for an operation. But the result is even worse (e.g.,
compiled_ii=9forbranch_for.mlir), this is because we introduce too many weighted factors, and it is hard to choose appropriate values.