using defer for status and logging improvements in ipaddressclaim_controller#496
using defer for status and logging improvements in ipaddressclaim_controller#496
Conversation
386f5a1 to
3ed5559
Compare
| o.Status.SyncState = netboxv1.SyncStateFailed | ||
| return ctrl.Result{}, err | ||
| } | ||
| o.Status.SyncState = netboxv1.SyncStateSucceeded |
There was a problem hiding this comment.
Why not have this in the defer statement too? I would assume if we ever encounter an error during reconciliation we should probably assume that the sync failed (there is probably an edge case here but overall the logic could be simplified to if we errored, we are not in sync) - this further brings more of the status reporting into the defer taking advantage of this pattern
There was a problem hiding this comment.
I agree, but I believe in Lea's case this is not possible since defer does not know what type of error it is. We could set the conditionMessage instead but probably an error type like in the ipAddressClaim controller could be used here.
There was a problem hiding this comment.
I'll check if it can be done with a new error type. It could work if the reconcile function is never exited without an error before the actual update/sync of the ip address happens. Otherwise the Condition could be set to true even if the reconcile function did not make any requests to NetBox.
| @@ -174,31 +175,24 @@ func (r *IpAddressReconciler) Reconcile(ctx context.Context, req ctrl.Request) ( | |||
|
|
|||
| netboxIpAddressModel, err := r.NetboxClient.ReserveOrUpdateIpAddress(ipAddressModel) | |||
| if err != nil { | |||
There was a problem hiding this comment.
Do we not want to set a conditionMessage for this error return?
There was a problem hiding this comment.
probably yes, or we could fetch the conditionMessage from the error message and set this if conditionMessage=="". What do you think @bruelea ?
There was a problem hiding this comment.
Agree, the error message could be used here. Using the error would be more intuitive than setting the conditionMessage variable.
| if o.Status.SyncState == netboxv1.SyncStateSucceeded { | ||
| debugLogger.Info(fmt.Sprintf("reserved ip address in netbox, ip: %s", o.Spec.IpAddress)) |
There was a problem hiding this comment.
If we reached here, we should have succeeded right? So I think this if statement is redundant
There was a problem hiding this comment.
agreed, will delete it
There was a problem hiding this comment.
There were cases where this line was reached but the ip address was actually not reserved in NetBox. So the function should be refactored to avoid this case.
| }, nil | ||
| } | ||
|
|
||
| func (r *IpAddressReconciler) UpdateConditions(ctx context.Context, o *netboxv1.IpAddress, msg string) error { |
There was a problem hiding this comment.
This is good, but might be a bit more readable using a switch statement e.g.
switch {
case o.Status.IpAddressUrl == "":
// code
case o.Status.SyncState == netboxv1.SyncStateFailed:
// more code
default:
// ready case
}The DeletionTimestamp check still needs to be done, but perhaps it could be inverted and have an early return? As there is only one state to go into if this is the case
| // If err is a StatusError, we've reported it in status conditions, so return nil to controller-runtime | ||
| // This prevents exponential backoff for user-facing errors that are already visible in status | ||
| // Regular errors are still returned to trigger retry with backoff | ||
| if err != nil && IsStatusError(err) { |
There was a problem hiding this comment.
Why not follow the pattern given by controller-runtime's client.IgnoreNotFound(err)?
We could then do err = IgnoreStatusError(err)
There was a problem hiding this comment.
StatusError could be named a little more clearly - it seems to be errors caused when communicating with Netbox? And other errors are errors caused by issues during the reconcile loop, do I understand correctly here?
There was a problem hiding this comment.
I really like the proposal in the first comment!
I agree the naming is bad, well actually this is just to define if the error should in the end be returned by the controller loop, so visible with the stacktrace in the logs or not. But basically you are right, StatusError just means the reconciliation did not fully succed but there was no error in the code. Could be claim error, no ip's left or any other error caught somehwere which is not unexpected but due to some user input or system state.
| // Reconcile is part of the main kubernetes reconciliation loop which aims to | ||
| // move the current state of the cluster closer to the desired state. | ||
| func (r *IpAddressClaimReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { | ||
| func (r *IpAddressClaimReconciler) Reconcile(ctx context.Context, req ctrl.Request) (result ctrl.Result, err error) { |
There was a problem hiding this comment.
Perhaps rename err to resultErr as in this function err is shadowed in some code paths which makes it harder to see if this is what will be returned, or if it's just temporary
| // updateStatus updates the IpAddressClaim status based on the current state of the owned IpAddress. | ||
| // This function is called as a deferred function in Reconcile to ensure status is always updated. | ||
| // It captures any reconcile errors to include them in the status condition message. | ||
| func (r *IpAddressClaimReconciler) updateStatus(ctx context.Context, claim *netboxv1.IpAddressClaim, lookupKey types.NamespacedName, reconcileErr error, debugLogger logr.Logger) error { |
There was a problem hiding this comment.
We should decide if we never update in case of an error in this function (current state). The alternative is to call Update even if we fail setting some of these statuses. Not saying this is wrong, but we should consider which way we want to go with this
There was a problem hiding this comment.
mhm good point, what do you think about adding a second defer function which only calls the Status().Update()? :D
There was a problem hiding this comment.
I checked the code of the EventStatusRecoder again, it Report() always returns nil because the Status().Update() was removed to the updateStatus function and the return variable should be removed too.
The only error here would be if the client fails to the the IpAddress CR, in that case the status conditions should still be updated.
| return fmt.Errorf("failed to report IpAssigned false condition: %w", err) | ||
| } | ||
| } else { | ||
| // No error and no IpAddress - this shouldn't happen in normal flow, skip update |
There was a problem hiding this comment.
Is this case not something we would want to report to the user somehow?
Also, this currently would skip even setting the initial ready condition from the top of this function (in case it wasn't present) making the user not see any progress, unless they have debug logging switched on and checked the operator logs
There was a problem hiding this comment.
Ah these status conditions always become complexe, but good point, I will actually remove this if condition
| } | ||
|
|
||
| // IpAddress exists - report successful IP assignment if not already reported | ||
| if apismeta.FindStatusCondition(claim.Status.Conditions, netboxv1.ConditionIpAssignedTrue.Type) == nil { |
There was a problem hiding this comment.
This only checks for a Status Condition with the type IPAssigned, not checking if it is true - is that the intent?
And what if we don't assign an IP? Don't we want to set AssignedFalse here?
There was a problem hiding this comment.
True this should check if it is true and set to true if not.
So the idea is that if the ipAddress cr does not exist, this is cought further up in the
/ Fetch the latest IpAddress object ipAddress := &netboxv1.IpAddress{} if err := r.Client.Get(ctx, lookupKey, ipAddress); err != nil { if apierrors.IsNotFound(err) {
| } | ||
|
|
||
| // Update status based on IpAddress readiness | ||
| if apismeta.IsStatusConditionTrue(ipAddress.Status.Conditions, "Ready") { |
There was a problem hiding this comment.
Should this use the defined constants as in the other if statements?
There was a problem hiding this comment.
The idea here is to check if the ipAddress CR which was generated from this claim has the ready condition set to true. If so, then the claim can also be set to true. Or am I missing something here
| @@ -174,31 +175,24 @@ func (r *IpAddressReconciler) Reconcile(ctx context.Context, req ctrl.Request) ( | |||
|
|
|||
| netboxIpAddressModel, err := r.NetboxClient.ReserveOrUpdateIpAddress(ipAddressModel) | |||
| if err != nil { | |||
There was a problem hiding this comment.
Agree, the error message could be used here. Using the error would be more intuitive than setting the conditionMessage variable.
| if o.Status.SyncState == netboxv1.SyncStateSucceeded { | ||
| debugLogger.Info(fmt.Sprintf("reserved ip address in netbox, ip: %s", o.Spec.IpAddress)) |
There was a problem hiding this comment.
There were cases where this line was reached but the ip address was actually not reserved in NetBox. So the function should be refactored to avoid this case.
| func (r *IpAddressClaimReconciler) updateStatus(ctx context.Context, claim *netboxv1.IpAddressClaim, lookupKey types.NamespacedName, reconcileErr error, debugLogger logr.Logger) error { | ||
| debugLogger.Info("updating ipaddressclaim status") | ||
|
|
||
| // Initialize status conditions if this is a new resource |
There was a problem hiding this comment.
Can't the initialization be skipped now if the status and Ready Condition is updated with the defer function?
| // updateStatus updates the IpAddressClaim status based on the current state of the owned IpAddress. | ||
| // This function is called as a deferred function in Reconcile to ensure status is always updated. | ||
| // It captures any reconcile errors to include them in the status condition message. | ||
| func (r *IpAddressClaimReconciler) updateStatus(ctx context.Context, claim *netboxv1.IpAddressClaim, lookupKey types.NamespacedName, reconcileErr error, debugLogger logr.Logger) error { |
There was a problem hiding this comment.
I checked the code of the EventStatusRecoder again, it Report() always returns nil because the Status().Update() was removed to the updateStatus function and the return variable should be removed too.
The only error here would be if the client fails to the the IpAddress CR, in that case the status conditions should still be updated.
wip