fix: Clean up nolint directives marked as TODO - Part 3#832
fix: Clean up nolint directives marked as TODO - Part 3#832cbumb wants to merge 14 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors main startup into modular helpers for flags/TLS/config-server wiring; tightens per-line lint directives in API types; refactors config defaults into focused helpers; adds transient gRPC error handling and status helpers to RebootNode controller; replaces inline field indexers with dedicated helper indexers. No exported/public signatures changed. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI / Flags
participant Config as Config Loader & Server
participant TLS as tlsAndServers
participant Cert as Cert Watcher
participant Manager as Controller Manager
CLI->>Config: parseFlags -> load config
Config->>TLS: provide config, ports, cert paths
TLS->>Cert: start certificate watchers
Cert-->>TLS: supply TLS assets
TLS->>Manager: configure webhook & metrics servers with TLS
Manager->>Config: start config server concurrently
Manager-->>CLI: start manager (blocks)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Merging this branch will decrease overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@janitor/pkg/controller/rebootnode_controller.go`:
- Around line 180-200: handleRebootInProgress currently treats any error from
checkNodeReadyFromCSP as a permanent failure by calling
resultForNodeReadyFailed; change it to distinguish transient gRPC/network errors
from permanent failures: inspect nodeReadyErr (from checkNodeReadyFromCSP) for
transient gRPC codes (e.g., Unavailable, DeadlineExceeded) and in those cases
return a requeue (e.g., ctrl.Result{RequeueAfter: shortBackoff}) instead of
calling resultForNodeReadyFailed, while preserving the existing behavior for
non-transient errors; keep other flows (resultForNodeReadySucceeded,
resultForNodeReadyTimeout and getRebootTimeout) unchanged.
- Around line 330-367: The code currently sends the reboot via
RebootNodeReconciler.CSPClient.SendRebootSignal then only mutates the in-memory
RebootNode condition via RebootNode.SetCondition, which can be lost if the
subsequent status persistence (updateRebootNodeStatusIfChanged) fails and causes
a duplicate signal on retry; after a successful CSPClient.SendRebootSignal call,
set the SignalSent condition (as you already do) and immediately persist that
condition to the API (e.g., call r.Client.Status().Update(ctx, rebootNode) or
reuse updateRebootNodeStatusIfChanged to persist the status) before returning;
if that status update fails, return the error so reconciliation retries the
status persistence path (or handle a dedicated interim condition like
SignalPending if you prefer), ensuring the unique symbols involved are
sendRebootSignalAndSetCondition, RebootNode.SetCondition,
r.Client.Status().Update (or updateRebootNodeStatusIfChanged), and
CSPClient.SendRebootSignal.
- Line 57: The `nolint` directive preceding the kubebuilder RBAC marker is
inconsistent (it uses "// nolint:lll") and may not be recognized by linters;
update the comment to use the compact form "//nolint:lll // kubebuilder RBAC
marker must stay on one line" to match the other directive ("//nolint:dupl") and
ensure the linter suppression is applied for the rebootnode_controller.go RBAC
comment.
🧹 Nitpick comments (4)
janitor/pkg/controller/rebootnode_controller.go (2)
141-176: Good pattern for status updates, but returned errors lack wrapping context.The
updateRebootNodeStatusIfChangedhelper is well-structured (re-fetch → copy status → update). However, the errors returned on lines 162 and 170 are bare — they lose the context of which operation failed when surfaced to the caller.Proposed fix: wrap errors with context
- slog.Error("failed to refresh RebootNode before status update", "error", err) - - return err + slog.Error("failed to refresh RebootNode before status update", "error", err) + + return fmt.Errorf("refreshing RebootNode %q before status update: %w", rebootNode.Name, err) } freshRebootNode.Status = rebootNode.Status if err := r.Status().Update(ctx, &freshRebootNode); err != nil { - slog.Error("failed to update RebootNode status", "error", err) - - return err + slog.Error("failed to update RebootNode status", "error", err) + + return fmt.Errorf("updating RebootNode %q status: %w", rebootNode.Name, err) }As per coding guidelines, "Wrap errors with context using
fmt.Errorf("context: %w", err)in Go code".
369-395:SetupWithManageruses insecure gRPC credentials with no configuration escape hatch.
grpc.WithTransportCredentials(insecure.NewCredentials())is hardcoded. This may be intentional for in-cluster sidecar communication, but there's no flag or config to enable TLS for the CSP connection. Worth a brief inline comment explaining why insecure is acceptable here.janitor/main.go (2)
350-358: Config handler may send a partial response with incorrect status on encode failure.
w.Header().Set("Content-Type", ...)is written beforeEncode. IfEncodestarts writing the body and then fails mid-stream, thehttp.Erroron line 355 will not be able to set the 500 status code (headers already flushed). For a small config struct this is unlikely, but encoding to a buffer first is more robust.Proposed fix: buffer before writing
configHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - w.Header().Set("Content-Type", "application/json") - - if err := json.NewEncoder(w).Encode(cfg); err != nil { + data, err := json.Marshal(cfg) + if err != nil { slog.Error("Failed to encode configuration as JSON", "error", err) http.Error(w, "Internal Server Error", http.StatusInternalServerError) - return } + + w.Header().Set("Content-Type", "application/json") + w.Write(data) })
333-341: Port parsing fallback is fragile for malformed addresses.If
net.SplitHostPortfails (e.g., address is just":8082"without a host on some edge), the fallback trims a leading:manually. This works for the common:PORTcase, but consider usingstrings.TrimPrefixfor clarity and to avoid a potential panic ifportStris empty after theSplitHostPorterror path.Proposed fix
+ "strings" ... _, portStr, err := net.SplitHostPort(configAddr) if err != nil { - portStr = configAddr - - if portStr != "" && portStr[0] == ':' { - portStr = portStr[1:] - } + portStr = strings.TrimPrefix(configAddr, ":") }
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@janitor/pkg/controller/rebootnode_controller.go`:
- Around line 363-404: The successful path in sendRebootSignalAndSetCondition
persists the SignalSent condition via updateRebootNodeStatusIfChanged, but
reconcileHelper then calls updateRebootNodeStatusIfChanged again with the stale
originalRebootNode causing duplicate API writes; change
sendRebootSignalAndSetCondition to return an additional boolean (e.g.,
statusPersisted bool) indicating it already performed the status update (or
alter it to update originalRebootNode in-place after persisting), update the
call site in reconcileHelper to check that flag and skip the outer
updateRebootNodeStatusIfChanged when true, and update function signatures and
callers accordingly (refer to sendRebootSignalAndSetCondition, reconcileHelper,
and updateRebootNodeStatusIfChanged).
🧹 Nitpick comments (1)
janitor/pkg/controller/rebootnode_controller.go (1)
148-181: Consider wrapping returned errors with context.Lines 167 and 175 return raw errors after logging. While the
slog.Errorcalls provide some traceability, callers (and any error-aggregation tooling) lose the context about what operation failed. Per the coding guidelines, errors should be wrapped withfmt.Errorf("context: %w", err).Suggested wrapping
- slog.Error("failed to refresh RebootNode before status update", "error", err) - - return err + return fmt.Errorf("failed to refresh RebootNode before status update: %w", err) } freshRebootNode.Status = rebootNode.Status if err := r.Status().Update(ctx, &freshRebootNode); err != nil { - slog.Error("failed to update RebootNode status", "error", err) - - return err + return fmt.Errorf("failed to update RebootNode status: %w", err) }The
slog.Errorcalls can be kept if desired, but the wrapping ensures structured error propagation.As per coding guidelines,
**/*.go: "Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code".
Merging this branch will decrease overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@janitor/pkg/controller/rebootnode_controller.go`:
- Around line 110-149: The early-return in reconcileHelper currently returns any
error from handleRebootNotStarted directly, which lets the sentinel
errRebootNodeDeleted surface; change that early-return so that after calling
handleRebootNotStarted (the branch where you get result, err, statusPersisted)
you check if errors.Is(err, errRebootNodeDeleted) and if so return
ctrl.Result{}, nil, otherwise return ctrl.Result{}, err—this mirrors the outer
status-update path and prevents errRebootNodeDeleted from being treated as a
reconcile error; locate symbols handleRebootNotStarted, reconcileHelper,
updateRebootNodeStatusIfChanged, and errRebootNodeDeleted to implement the
check.
- Around line 369-410: In sendRebootSignalAndSetCondition: don't treat all
SendRebootSignal errors as permanent failures — detect transient gRPC errors
using the existing isTransientGRPCError helper (same check used in IsNodeReady)
and, when transient, return a requeue (e.g., ctrl.Result{RequeueAfter: 30 *
time.Second}) without setting CompletionTime or marking the RebootNode condition
to Failed; only set the False/Failed condition, call SetCompletionTime, and
increment metrics.GlobalMetrics for ActionTypeReboot/StatusFailed when the error
is non-transient (permanent).
🧹 Nitpick comments (3)
janitor/pkg/controller/rebootnode_controller.go (2)
151-186: Errors fromGetandStatus().Updateare returned without context wrapping.Lines 170-172 and 177-180 return raw errors. The coding guidelines require wrapping with
fmt.Errorf("context: %w", err)to aid debugging when the error surfaces in the reconciler call stack.Proposed fix
if err := r.Get(ctx, objectKey, &freshRebootNode); err != nil { if apierrors.IsNotFound(err) { slog.Info("Post-reconciliation status update: not found, object assumed deleted", "node", rebootNode.Name) return errRebootNodeDeleted } slog.Error("failed to refresh RebootNode before status update", "error", err) - return err + return fmt.Errorf("refreshing RebootNode before status update: %w", err) } freshRebootNode.Status = rebootNode.Status if err := r.Status().Update(ctx, &freshRebootNode); err != nil { slog.Error("failed to update RebootNode status", "error", err) - return err + return fmt.Errorf("updating RebootNode status: %w", err) }As per coding guidelines: "Wrap errors with context using
fmt.Errorf("context: %w", err)in Go code".
324-338: Non-idiomatic Go return signature:(ctrl.Result, error, bool)— error should be last.Go convention places
erroras the final return value. The(ctrl.Result, error, bool)ordering inhandleRebootNotStartedandsendRebootSignalAndSetConditionwill surprise readers and can lead to accidental misuse at call sites.Suggested signature change
-func (r *RebootNodeReconciler) handleRebootNotStarted( - ctx context.Context, originalRebootNode, rebootNode *janitordgxcnvidiacomv1alpha1.RebootNode, node *corev1.Node, -) (ctrl.Result, error, bool) { +func (r *RebootNodeReconciler) handleRebootNotStarted( + ctx context.Context, originalRebootNode, rebootNode *janitordgxcnvidiacomv1alpha1.RebootNode, node *corev1.Node, +) (result ctrl.Result, statusPersisted bool, err error) {Apply the same change to
sendRebootSignalAndSetConditionand update all call sites accordingly.janitor/pkg/controller/rebootnode_controller_test.go (1)
170-175:AfterEachwill fail and obscure the real test failure if the RebootNode was deleted.
AfterEachunconditionally callsExpect(err).NotTo(HaveOccurred())onGet. If a future test exercises deletion paths (or a test happens to clean up the resource), theAfterEachassertion will fail, masking the actual outcome.Suggested resilience improvement
AfterEach(func() { err := k8sClient.Get(ctx, types.NamespacedName{Name: testRebootNode.Name}, testRebootNode) - Expect(err).NotTo(HaveOccurred()) + if err != nil { + return // object already gone; nothing to validate + } // Ensure that the RebootNode conditions are valid checkStatusConditions(testRebootNode.Status.Conditions) })
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
691c494 to
0f9316d
Compare
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
Summary
Refactors nolint directives by lowering complexity and improving structure. No intended behavior change.
Modules updated in this PR:
janitor
Type of Change
Component(s) Affected
Testing
Checklist
Summary by CodeRabbit