-
Notifications
You must be signed in to change notification settings - Fork 14
Description
What happened?
When testing combinations of resources that depend on each other with Usages, the delete step fails even though the cleanup eventually succeeds. The test contained following resources:
- XAWSLBController
- XEKS
- XNetwork
XAWSLBController contains a helm chart and a Usage of XEKS by a helm Release.
Running uptest against such a configuration leads to the following situation:
- We delete XAWSLBController first which succeeds immediately
- Afterwards we delete XEKS and get en error because cleanup in the background of XAWSLBController didn't finish yet, the Usage is still around
This is the error we're seeing:
"nousages.apiextensions.crossplane.io" denied the request: This resource is in-use by 1 Usage(s), including the Usage "configuration-aws-lb-controller-7d62z" by resource Release/configuration-aws-lb-controller-xmlng.
Working around this as of today is possible with a pre-delete or post-delete hook but we feel that it's not a great approach as it introduces additional hurdles for people trying uptest and it's leaking orchestration details which are already handled inside the core of crossplane.
Things we tried and discussed:
- omit "wait: true" from the delete statement. This has the same effect.
- run delete steps with
|| true. This is possible and cleanup will succeed but we will swallow all errors regardless if it's connected to usages or not - running in a loop and catch exit-code and stderr, then compare with stderr on failure with
"nousages.apiextensions.crossplane.io" denied the request. Can work but can cause trouble with additionalpre-deleteand similar hooks as they now would require to be idempotent or we need to catch additional errors. - finding all usages via owner-references connected to the current xr to delete and issue
kubectl waitto wait for usages to be cleaned up before proceeding. Can also work but leaps deeply into internals.
Another thought: It might be that we're trying to work around a behavior which is actively not supported in chainsaw and this is why proposed solutions look kinda ugly. Maybe it's worthwhile creating a PR on chainsaw introducing something like retry paramter for a script step. At the very least we'll get some feedback from maintainers how they envision such a flow to work because trying to delete an object temporarily protected by an admission webhook might be not a standard usecase but i can imagine other setups can hit the same wall without being specific to crossplane.
Sidenote: This only occurs when the objects in question are NOT part of the same composition, as in this case deletion-errors are not visible to the outside and crossplane handles the cleanup flawlessly.
How can we reproduce it?
Running uptest on this changeset without including the post-delete hook: upbound/configuration-aws-lb-controller#1
What environment did it happen in?
- Uptest Version: v1.1.2