Skip to content

Testing multiple resources linked with Usages lead to failure in test even though deletion succeeds #33

@kaessert

Description

@kaessert

What happened?

When testing combinations of resources that depend on each other with Usages, the delete step fails even though the cleanup eventually succeeds. The test contained following resources:

  • XAWSLBController
  • XEKS
  • XNetwork

XAWSLBController contains a helm chart and a Usage of XEKS by a helm Release.

Running uptest against such a configuration leads to the following situation:

  • We delete XAWSLBController first which succeeds immediately
  • Afterwards we delete XEKS and get en error because cleanup in the background of XAWSLBController didn't finish yet, the Usage is still around

This is the error we're seeing:

"nousages.apiextensions.crossplane.io" denied the request: This resource is in-use by 1 Usage(s), including the Usage "configuration-aws-lb-controller-7d62z" by resource Release/configuration-aws-lb-controller-xmlng.

Working around this as of today is possible with a pre-delete or post-delete hook but we feel that it's not a great approach as it introduces additional hurdles for people trying uptest and it's leaking orchestration details which are already handled inside the core of crossplane.

Things we tried and discussed:

  • omit "wait: true" from the delete statement. This has the same effect.
  • run delete steps with || true. This is possible and cleanup will succeed but we will swallow all errors regardless if it's connected to usages or not
  • running in a loop and catch exit-code and stderr, then compare with stderr on failure with "nousages.apiextensions.crossplane.io" denied the request. Can work but can cause trouble with additional pre-delete and similar hooks as they now would require to be idempotent or we need to catch additional errors.
  • finding all usages via owner-references connected to the current xr to delete and issue kubectl wait to wait for usages to be cleaned up before proceeding. Can also work but leaps deeply into internals.

Another thought: It might be that we're trying to work around a behavior which is actively not supported in chainsaw and this is why proposed solutions look kinda ugly. Maybe it's worthwhile creating a PR on chainsaw introducing something like retry paramter for a script step. At the very least we'll get some feedback from maintainers how they envision such a flow to work because trying to delete an object temporarily protected by an admission webhook might be not a standard usecase but i can imagine other setups can hit the same wall without being specific to crossplane.

Sidenote: This only occurs when the objects in question are NOT part of the same composition, as in this case deletion-errors are not visible to the outside and crossplane handles the cleanup flawlessly.

How can we reproduce it?

Running uptest on this changeset without including the post-delete hook: upbound/configuration-aws-lb-controller#1

What environment did it happen in?

  • Uptest Version: v1.1.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions