-
Notifications
You must be signed in to change notification settings - Fork 53
Be truly immediate/non-blocking more often #565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…-blocking. Also updates tests. Mainly uses ImmediateSupplier and InterruptingImmediateSupplier for true non-blocking evaluation, with some other tricks used in other places. Some non-reliable calls may still fail, but most have been repaired, and the rest should be. (If the old semantics are _really_ needed you can resolve with a short wait.) Re-enables many of the tests disabled for https://issues.apache.org/jira/browse/BROOKLYN-272
a couple conflicts in cleanup to exec immediately
|
@aledsage rebased and ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahgittin I've been playing around with more tests, and I fear there's a chance we have a fundamental difference of opinion of what an "immediate supplier" should be allowed to do!
For me, "immediate supplier" means that the value's resolution is not dependent on other tasks executing within Brooklyn. However, it could do "blocking" calls such as IO, which would normally result in an InterruptedException being thrown (if the thread were set to interrupted). For example, the supplier could be defined as $brooklyn:external("vault", "aws.secretKey").
This use-case is supported by the explicit implementation in BrooklynDslCommon.DslExternal of the ImmediateSupplier interface. We are mostly ok with using this pattern, because we check if the value implements ImmediateSupplier, and call getImmediately() (without wrapping it in a InterruptingImmediateSupplier, which would cause it to be interrupted).
However, when I play around with a variant of EntityConfigTest.testGetImmediateSupplierNonBlockingMap() that uses a a value of type ImmediateSupplier, it now always fails. Previously, it passed (most of the time, failing occasionally on slow/overloaded servers!) because we gave it 500ms to execute. Now it wraps it in an InterruptingImmediateSupplier which causes it to fail.
See #581 for that test. If you agree that is a valid use-case, then can you please merge that PR first and rebase you branch accordingly?
Following the code path (see stacktrace snippet below):
- the
AbstractConfigurationSupportInternal.getNonBlockingResolvingStructuredKey()has created aDynamicSequentialTask, whose job is to callget(key(). BasicExecutionContext.getImmediatelyhas retrieved the underlying job from this task, which is aDynamicSequentialTask.DstJob(that contains a single job).- It therefore wraps it in
InterruptingImmediateSupplier(becauseDstJobdoes not implementImmediateSupplier). - It executes the task, but with the task marked as cancelled. Therefore the
Thread.sleep(1)gets interrupted immediately.
Thread [main] (Suspended)
BasicExecutionContext.getImmediately(Object) line: 141
AbstractEntity$BasicConfigurationSupport(AbstractConfigurationSupportInternal).getNonBlockingResolvingStructuredKey(ConfigKey<T>) line: 115
AbstractEntity$BasicConfigurationSupport(AbstractConfigurationSupportInternal).getNonBlocking(ConfigKey<T>) line: 82
EntityConfigTest2$ConfigNonBlockingFixture.runGetConfigNonBlockingInMap() line: 421
EntityConfigTest2.testGetExplicitImmediateSupplierNonBlockingMap() line: 534
This looks tricky to fix "properly". Even if we avoid the use of DynamicSequentialTask, getImmediately() still only gets the wrapper job org.apache.brooklyn.core.objs.AbstractConfigurationSupportInternal$1. It therefore doesn't know that the underlying value is an ImmediateSupplier so it gets wrapped and thus interrupted.
I suggest for now that we put getNonBlockingResolvingStructuredKey() back to what it was like before (i.e. the ugly use of t.get(ValueResolver.NON_BLOCKING_WAIT)).
Otherwise, anyone who is using $brooklyn:external inside a MapConfigKey will never get their value if using "immediately".
| .getMaybe(); | ||
| if (resolved.isAbsent()) return Maybe.Absent.<T>castAbsent(resolved); | ||
| if (resolved.get()==marker) { | ||
| // TODO changed Feb 2017, previously returned absent, in contrast to what the javadoc says |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this code will ever be called. The .defaultValue(marker) only affects the behaviour of .get(). It doesn't affect the result of .getMaybe(), so we'll never have the marker object returned (now that you've changed it to use getMaybe()).
I'm fine with us leaving this in for your PR, but I'd be interested what situation you're trying to cover with this. I tried writing a few more tests but could never get into this code path.
My understanding of what it was doing previously... By setting the defaultValue(marker) and calling get(), it was trying to avoid the exception being thrown when getMaybe() would have return an absent (so instead it returned us the marker). That would seem to agree with the javadoc (which javadoc are you referring to?).
However, I agree that your change to call getMaybe() instead is much nicer - we get to keep the original Maybe.absent() value has the better explanation of why it's absent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think you are correct here. I've removed the use of marker altogether.
|
@aledsage Sorry for the confusion. Thanks for long exposition. I thought almost everyone hated the "wait" as part of "immediate", from discussions ages ago somewhere. Waiting 200ms is smelly. And it hurts us writing deterministic tests, and causes nasty delays when we validate (e.g. add 1000 unresolved dependents as config keys and startup will be 3m+ while it waits for them to "validate"!). Surprised that you do like it as usually it's me being loosy goosy! Immediate eval was not guaranteed before in any circumstances (anything can take >200ms or whatever the threshhold was) so it is a strict improvement in contract that now we do guarantee immediate eval for non-blocking code. Yes some blocking tasks which previously succeeded in practice for immediate eval now will not. However I think there are very few of these. None of the old tests failed which gave confidence for this change (much of which was done in #480 based on https://issues.apache.org/jira/browse/BROOKLYN-329 hence the brevity in this PR). I'm investigating your new test but to me sleeping (for any length) while someone is waiting on your "immediate" value shouldn't be acceptable. If the caller does want short-timeout behaviour that codepath is still available (and used by immediate in a few cases, e.g. shared tasks as config value). (It might be appropriate when the REST API is doing get-all-config for instance, possibly as an API argument?) As for I think there are some places that a suprising |
|
@aledsage I think I better understand now. It isn't the sleeps that are the offenders -- it is the DST/map interplay. Still getting my head round it but wanted to say that. |
|
To add some more context (without having looked at the changes) - before adding ImmediateSupplier tasks with a short timeout (the way we implemented "immediate") would get cancelled even before the thread had the chance to get scheduled. So it's not all about whether the thread itself blocks for more than what we are ready to wait for. |
|
@neykov you beat me to replying - agreed :-) @ahgittin to give some more background, the original use-case for "immediate" stemmed from a failure in clocker: https://github.com/brooklyncentral/clocker/blob/v2.1.0/swarm/catalog/swarm/swarm.bom#L260-L270 In a However, sometimes on a heavily loaded machine, the "immediately" would time out after 200ms even though the value really was available. The trigger sensor would never change again, so the target sensor would never be set. By adding It is that style of use-case / usage that makes me want to let I believe this model would be simpler for a user to understand, rather than explaining to tell them that their config resolution will fail in certain situations, such as the Longer term, I definitely want to get rid of the ugly/brittle "wait for up to 500ms", but I'd prefer we leave that in for a while longer rather than remove support for the above use-case. |
|
Thanks for the explanation. I realized from your (excellent) tests that the thing we want to allow is to permit an I think we can do this by clearing an interrupt flag in I think we can't avoid the fact that immediate behaviour is complex. However we can say something like:
(Should we add this to the docs?) |
|
All tests pass for me locally with this. And re-reading I think @aledsage we're not in disagreement; I simply overlooked the subtlety of |
c7de1a8 to
617256d
Compare
|
Failure at https://builds.apache.org/job/brooklyn-server-pull-requests/1820/ seems due to build environment -- Doing |
|
This time error is ... unrelated non-deterministic ? |
617256d to
d214ca9
Compare
neykov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Looks like @ahgittin addressed the "blocking immediate" concern. Can you kick another build - the one is stuck in limbo.
Is it worth flipping immediate on the ValueResolver by default if it detects it's in a interrupted thread? This is to support immediate in deep/recursive resolving where not everything is ImmediateSupplier.
| : Maybe.<T>absent(); | ||
| .getMaybe(); | ||
| if (resolved.isAbsent()) return Maybe.Absent.<T>castAbsent(resolved); | ||
| return TypeCoercions.tryCoerce(resolved.get(), key.getTypeToken()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that there's no marker object you can rely on ValueResolver's coercion directly (by passing in the expected type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could, but not 100% sure. There may be some corner cases and some lossiness between as(Class<T>) and tryCoerce(..., TypeToken<T>). Have added a comment.
|
@neykov I've added a comment in does that cover your thinking? if not -- and/or if you're sure of the |
Builds on #480, really quite small once that is merged, tidies up many of the "immediate"-related annoyances.
/cc @aledsage @grkvlt @neykov @sjcorbett