-
Notifications
You must be signed in to change notification settings - Fork 47
Adding Time Models to Shuttle #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b74d4a5 to
95b637b
Compare
|
Needs conflict resolution in order to run benchmarks |
shuttle/src/runtime/execution.rs
Outdated
| while ExecutionState::num_runnable() == 0 | ||
| && ExecutionState::with(|s| Rc::clone(&s.time_model)) | ||
| .borrow_mut() | ||
| .wake_next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto on the with comment from above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why do we need a loop? We can just do one call no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use a loop to protect against stale wakers in the time model. It's possible that we don't need this with more diligent tracking, but at least for the current TimeModel implementations there are cases where we have old wakers that don't end up actually making any futures runnable when woken.
I think it is more complexity that it's worth to say that the TimeModel must always wake a non-stale waker with wake_next
| runnable_tasks: Vec<*const Task>, | ||
|
|
||
| // Counter for unique timing resource ids (Sleeps, Timeouts and Intervals) | ||
| pub(crate) timer_id_counter: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this for? It's never used no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's used in shuttle/src/sync/time/mod.rs to differentiate sleepers with the same deadlines (see uses of increment_timer_counter())
shuttle/src/runtime/execution.rs
Outdated
| && ExecutionState::with(|s| Rc::clone(&s.time_model)) | ||
| .borrow_mut() | ||
| .wake_next() | ||
| {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto on the loop comment
|
|
||
| pub(crate) fn num_runnable() -> usize { | ||
| Self::with(|state| state.tasks.iter().filter(|t| t.runnable()).count()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should improve the tracking of runnable so that this becomes O(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, but I don't know if we want to block this PR on that change. if so, I should probably open another PR to do that separately.
this would be strongly related to building the set of runnable tasks incrementally, which I have a prototype of on this branch:
https://github.com/dylanjwolff/shuttle/tree/incremental-persistent-vec
but I haven't PR'ed it because I actually think the right thing to do is to change the Shuttle scheduler API so that the individual schedulers manage their own runnable task sets in whatever data structure is best for that algorithm (for example, priority queue for RP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah no don't block the PR on this.
And yeah, lets chat about potential scheduler API changes on Monday
d6d5ee7 to
2e1ffeb
Compare
2e1ffeb to
9135038
Compare
|
Might be worth adding this example as a test case: |
1d7e86c to
ac99d6e
Compare
ac99d6e to
665ff08
Compare
| @@ -0,0 +1,146 @@ | |||
| use std::{ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time models should live somewhere else than sync/time. I think Shuttle should be split up into a shuttle-core and shuttle, and time models should live in /time or /time_models (I'm leaning /time_models in order to avoid collisions with std::time and tokio::time.
| thread::spawn(move || { | ||
| for _ in 0..TEST_LENGTH { | ||
| thread::sleep(Duration::from_millis(1)); | ||
| thread::sleep(Duration::ZERO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't pay attention to this first time around, but why are we doing Duration::ZERO? The test should work just fine with a sleep of 1ms, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly, I read the original intent of these sleeps to be to insert a yield point without calling yield_now, since PCT takes that yield_now call as a hint that it might be in a busy loop and reduces the current thread's priority.
Since these tests are for the PCT algorithm/implementation and not meant to actually exercise timing behaviors, I think the timeouts should be Duration::ZERO if kept as a sleep, or (probably better) changed to an explicit yield_now_not_a_scheduling_hint API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah getcha, thanks. We could make switch pub. But also: surely these tests will be equivalent with what was before when running under the time model which implements time as modeled before this PR? Like we could just create a time model which makes these tests work with Duration::from_millis(1) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like we could just create a time model which makes these tests work with Duration::from_millis(1) ?
Yeah, this is also fine. My thought was that changing the duration to zero makes it more explicit that this test doesn't actually want any timing related behaviors to influence its outcome.
We could make switch pub.
For this specific test, I think it would probably be most clear if we did call switch directly. But the tradeoff for making it fully pub is that there are now two ways to insert a context switch for users of Shuttle, and the difference between the two is relatively subtle.
| @@ -0,0 +1,825 @@ | |||
| //! Time | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not be under time
|
Converted to draft just to note that it is not mergeable because time models should live elsewhere. I'll reopen once its ready for review |
| } | ||
|
|
||
| /// The trait implemented by each TimeModel | ||
| pub trait TimeModel: std::fmt::Debug { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: newline after each function
This PR creates a new API in Shuttle to allow for specifying a model for wall-clock time and associated functions / primitives.
In spirit, this API is similar to that of the Scheduler API in that it allows users to swap out different time models (or create their own time models), depending on their application and use case.
As initial examples, this PR contains two time models:
ConstantSteppedTimeModel, which advances a global time by a configurable constant amount (starting from 0) at each scheduling step.FrozenTimeModel, which is based on aConstantSteppedTimeModelof where the step size is zero. This model additionally provides the ability for users to manually expire Sleeps and Timeouts without advancing the global clock.Both of these models will automatically "fast-forward" time to wake the next sleeping tasks if all tasks are sleeping.
Primitive Representations:
As part of this change, Shuttle now vends it's own
Duration,Instant,Sleep,IntervalandTimeoutprimitives, whose behavior depends on the current TimeModel. These primitives are intended to eventually fully model the corresponding types in std::time and tokio::time. In this PR coverage of all functionality is incomplete, but enough core features are implemented such that most common use cases should be covered. Full feature parity can be achieved over time as the TimeModels become more mature.The Shuttle
Instantprimitive is an enum. For this PR, there is only one representation of Durations, using a concretestd::time::Duration. However, in the future we can add representations to support other Time Models which might use logical scalar or vector clocks, for example.The Shuttle
Durationprimitive is just astd::time::Durationby default. The reason for this is thatstd::time::Durationis very commonly used in library APIs. This can make switching to a different Duration type extremely painful, as all dependencies also must be made compatible. By leavingDurationunchanged, we can keep the barrier to adoption of time modeling low. With theadvanced-time-modelsfeature flag, users can opt-in to time models that may not be able to represent time intervals as a single value (for example, elapsed time between vector-clock timestamps).Alternative Designs:
Because
std::time::Durationand other primitives are concrete types, the corresponding Shuttle primitives also must be concrete. Instead of enums with a fixed number of variants, we could instead opt for a struct which contains a type erasedBox<dyn Duration>. This is more flexible, as users can even bring their own primitive representations without needing to make changes to Shuttle itself to add an enum variant. However, it would require type-casting in all operations involving multiple primitives (for example, comparing two durations or arithmetic operations between instants and/or durations). This approach is also further complicated by manyDurationmethods beingconst, which would preclude a fully dynamicDurationimplementation.As long as there are only a small number of representations for these primitives that all Time Models share, then an enum allows them to share logic without these complications. However, if later we find that there are many different primitive representations then we may need to reconsider the fully dynamic approach instead.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.