Hyperfleet-267 - feat: Implement Minimal Real GCP Validation and Related Helm Chart#10
Conversation
WalkthroughAdds a new Helm AdapterConfig and job templates for a HyperFleet GCP validation adapter, consolidating chart mode to Sequence Diagram(s)sequenceDiagram
participant Env as Env/Config
participant Registry as Validator Registry
participant Resolver as Dependency Resolver
participant Executor as Executor
participant Validator as Validator Implementations
participant GCP as GCP Services
Env->>Registry: validators register (init)
Env->>Executor: NewContext(LoadFromEnv)
Executor->>Registry: GetAll()
Registry-->>Executor: validators[]
Executor->>Resolver: NewDependencyResolver(validators)
Resolver-->>Executor: ExecutionGroup[] (levels)
loop For each ExecutionGroup (by level)
par Parallel validators
Executor->>Validator: Validate(ctx)
Validator->>GCP: Create client (WIF + retry/backoff)
GCP-->>Validator: API responses
Validator-->>Executor: Result
end
Executor->>Executor: store Result in Context (mutex)
alt StopOnFirstFailure triggered
Executor->>Executor: abort remaining groups
end
end
Executor->>Executor: Aggregate results
Executor->>Env: Write JSON results file
sequenceDiagram
participant HyperFleet as HyperFleet API
participant Preconditions as Adapter Preconditions
participant K8s as Kubernetes API
participant Job as Validation Job (Pod)
participant Reporter as Status Reporter
Preconditions->>HyperFleet: GET /clusterStatus
HyperFleet-->>Preconditions: cluster info (name, phase, project)
Preconditions->>HyperFleet: GET /clusterAdapterStatus
HyperFleet-->>Preconditions: adapter status
Preconditions->>K8s: Create ServiceAccount/Role/RoleBinding
K8s-->>Preconditions: ACK
Preconditions->>K8s: Create Job (gcp-validator + status-reporter)
K8s-->>Job: Pod scheduled / runs
Job->>GCP: Perform API validations
Job->>Reporter: Write results to /results
Reporter->>HyperFleet: POST clusterStatus payload
HyperFleet-->>Reporter: ACK
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@charts/configs/validation-adapter.yaml`:
- Around line 83-85: The environment variable key for the field named
"managedByResourceName" uses the wrong casing in its source string
("env.MANAGED_By_RESOURCE_NAME"); update the source to use consistent uppercase
"env.MANAGED_BY_RESOURCE_NAME" so the mapping matches the intended environment
variable and avoids casing bugs.
In `@charts/configs/validation-job-adapter-task.yaml`:
- Around line 15-17: The template hardcodes activeDeadlineSeconds (310) which
must be computed from the adapter's maxWaitTimeSeconds plus a buffer; update the
Kubernetes job manifest to accept a Helm/templating value (e.g.,
values.adapter.maxWaitTimeSeconds) and set activeDeadlineSeconds to that value
plus the 10s buffer (e.g., {{ add .Values.adapter.maxWaitTimeSeconds 10 }}), add
a corresponding values.yaml entry (adapter.maxWaitTimeSeconds) and a comment
documenting the coupling, and if desired open a follow-up issue to track making
this configurable everywhere; reference activeDeadlineSeconds and
maxWaitTimeSeconds when making these changes.
In `@validator/go.mod`:
- Line 8: Update the golang.org/x/oauth2 module to v0.27.0 or later in go.mod
(replace the current golang.org/x/oauth2 v0.15.0 entry), then run the
appropriate Go tooling (go get golang.org/x/oauth2@v0.27.0 or newer and go mod
tidy) to refresh go.sum; run tests (go test ./...) and any CI checks to ensure
no breakages and commit the updated go.mod and go.sum. Ensure the change
addresses the CVE affecting jws.Verify by selecting at least v0.27.0 (or newer)
and rebuild/vendor as required.
In `@validator/pkg/validator/executor_test.go`:
- Around line 18-47: The AfterEach currently unconditionally unsets PROJECT_ID
which can remove a pre-existing environment value; modify the test setup so
BeforeEach captures the prior PROJECT_ID (use os.LookupEnv) into a local
variable (e.g., prevProjectID, prevProjectIDSet) before calling os.Setenv, and
update AfterEach to restore the previous state by calling
os.Setenv(prevProjectID) if it existed or os.Unsetenv("PROJECT_ID") only if it
did not exist originally, updating the BeforeEach/AfterEach blocks in
executor_test.go (referencing the BeforeEach and AfterEach functions and the
PROJECT_ID env usage) accordingly.
In `@validator/pkg/validator/executor.go`:
- Around line 131-138: The code assumes validator.Validate(ctx, e.ctx) always
returns a non-nil *Result which can cause a nil pointer panic when setting
result.Duration/Timestamp/ValidatorName; update executor.go to defensively
handle a nil return from validator.Validate by checking if result == nil after
the call (where result is assigned from validator.Validate), and if nil either
instantiate a default Result (with appropriate fields like Failed=true and an
explanatory Message), or log the error via e.logger.Info/Error (using
validator.Metadata().Name for context) and skip populating Duration/Timestamp to
avoid the panic; ensure the Duration and Timestamp are still set on the non-nil
Result before further use.
In `@validator/pkg/validators/api_enabled.go`:
- Around line 96-143: The code does not handle an empty vctx.Config.RequiredAPIs
leading to misleading success; before the loop over requiredAPIs add an early
check (len(requiredAPIs) == 0) and return a short-circuit validator.Result
(e.g., StatusWarning or StatusSkipped) with a clear Message like "no required
APIs configured" and include contextual Details (e.g., "required_apis": 0,
"project_id": vctx.Config.ProjectID); place this check immediately after
requiredAPIs := vctx.Config.RequiredAPIs so it executes prior to creating
contexts or calling svc.Services.Get, and use the same validator.Result struct
shape used elsewhere in this file.
In `@validator/pkg/validators/quota_check.go`:
- Around line 31-86: The QuotaCheckValidator.Validate currently returns
StatusSuccess for an unimplemented stub; change it to return
validator.StatusSkipped (or validator.StatusFailure if a config flag like
vctx.Config.EnableQuotaCheck indicates enforcement) and set Reason to
"QuotaCheckSkipped" (or "QuotaCheckDisabled"/"QuotaCheckFailed" when failing),
include the validator identity (e.g., add "validator": "QuotaCheckValidator") in
Result.Details, and ensure any caller/aggregation treats StatusSkipped as
neutral; update the Validate function and the returned Result fields
accordingly.
🧹 Nitpick comments (26)
validator/go.mod (1)
5-46: Update dependencies to latest stable versions.Several dependencies lag significantly behind their latest releases:
google.golang.org/apiis 106 versions behind (v0.154.0 → v0.260.0), andgolang.org/x/oauth2is 9 versions behind (v0.15.0 → v0.34.0). Since this is a new module, starting with the latest stable versions ensures you have recent bug fixes and security patches. Rungo get -uto update.validator/Makefile (1)
7-8: Make the image platform configurable.Hardcoding
linux/amd64can be inconvenient on arm64 dev machines; a variable keeps current behavior while allowing overrides.♻️ Proposed tweak
IMAGE_REGISTRY ?= quay.io/rh-ee-dawang IMAGE_TAG ?= latest +PLATFORM ?= linux/amd64 @@ - $(CONTAINER_CMD) build --platform linux/amd64 --no-cache --build-arg VERSION=$(VERSION) --build-arg GIT_COMMIT=$(GIT_COMMIT) -t $(IMAGE_REGISTRY)/$(PROJECT_NAME):$(IMAGE_TAG) . + $(CONTAINER_CMD) build --platform $(PLATFORM) --no-cache --build-arg VERSION=$(VERSION) --build-arg GIT_COMMIT=$(GIT_COMMIT) -t $(IMAGE_REGISTRY)/$(PROJECT_NAME):$(IMAGE_TAG) . @@ - $(CONTAINER_CMD) build --platform linux/amd64 --build-arg BASE_IMAGE=alpine:3.21 --build-arg VERSION=$(VERSION) --build-arg GIT_COMMIT=$(GIT_COMMIT) -t quay.io/$(QUAY_USER)/$(PROJECT_NAME):$(DEV_TAG) . + $(CONTAINER_CMD) build --platform $(PLATFORM) --build-arg BASE_IMAGE=alpine:3.21 --build-arg VERSION=$(VERSION) --build-arg GIT_COMMIT=$(GIT_COMMIT) -t quay.io/$(QUAY_USER)/$(PROJECT_NAME):$(DEV_TAG) .Also applies to: 138-179
validator/README.md (1)
27-44: Add a short auth prerequisite note for local/Docker runs.The validator uses Workload Identity Federation (mentioned in Features) via
google.DefaultClient, which requires Application Default Credentials to be available. The Run Locally and Run in Docker examples lack explicit guidance on this prerequisite, which would cause silent auth failures for first-time users.✍️ Suggested doc addition
### Run Locally ```bash export PROJECT_ID=my-gcp-project export RESULTS_PATH=/tmp/results.json + +# Ensure Application Default Credentials are available +# For local development: export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json ./bin/validator cat /tmp/results.json</details> </blockquote></details> <details> <summary>charts/templates/configmap-app.yaml (1)</summary><blockquote> `14-22`: **Consider adding a fallback for unknown deployment modes.** The conditional only handles `"dummy"` and `"real"` modes. If `deploymentMode` is set to an unrecognized value, no adapter task template will be included in the ConfigMap, which could cause silent failures at runtime. Consider adding an `{{- else }}` block with a fail directive or defaulting to one of the modes: <details> <summary>♻️ Suggested improvement</summary> ```diff {{- if eq .Values.deploymentMode "dummy" }} # Adapter task template referenced by the adapter config validation-dummy-job-adapter-task.yaml: | {{ .Files.Get "configs/validation-dummy-job-adapter-task.yaml" | nindent 4 }} {{- else if eq .Values.deploymentMode "real" }} # Adapter task template referenced by the adapter config validation-job-adapter-task.yaml: | {{ .Files.Get "configs/validation-job-adapter-task.yaml" | nindent 4 }} + {{- else }} + {{- fail (printf "Invalid deploymentMode '%s'. Must be 'dummy' or 'real'." .Values.deploymentMode) }} {{- end }}charts/templates/deployment.yaml (1)
83-107: LGTM with a note on consistency.The real deployment mode environment variables are properly defined and align with the validator's expected configuration. The structure mirrors the dummy block appropriately.
As noted in
configmap-app.yaml, consider adding a fallback{{- else }}block with a fail directive for unknowndeploymentModevalues to provide clear error messaging during template rendering.validator/pkg/config/config_test.go (1)
31-40: Subtle issue in environment restoration logic.The check
if v != ""cannot distinguish between an environment variable that was unset versus one that was explicitly set to an empty string. Both cases result inos.Getenv()returning"", but they have different semantics.In practice, this is unlikely to cause issues for these specific config variables, but for correctness:
♻️ More precise restoration using LookupEnv
BeforeEach(func() { // Save original environment - originalEnv = make(map[string]string) + originalEnv = make(map[string]*string) envVars := []string{ "RESULTS_PATH", "PROJECT_ID", "GCP_REGION", "DISABLED_VALIDATORS", "STOP_ON_FIRST_FAILURE", "REQUIRED_APIS", "LOG_LEVEL", "REQUIRED_VCPUS", "REQUIRED_DISK_GB", "REQUIRED_IP_ADDRESSES", "VPC_NAME", "SUBNET_NAME", } for _, v := range envVars { - originalEnv[v] = os.Getenv(v) + if val, exists := os.LookupEnv(v); exists { + originalEnv[v] = &val + } Expect(os.Unsetenv(v)).To(Succeed()) } }) AfterEach(func() { // Restore original environment for k, v := range originalEnv { - if v != "" { - Expect(os.Setenv(k, v)).To(Succeed()) - } else { - Expect(os.Unsetenv(k)).To(Succeed()) + if v != nil { + Expect(os.Setenv(k, *v)).To(Succeed()) } } })validator/pkg/gcp/client_test.go (2)
90-108: Potential test flakiness due to timing assumptions.The assertion on Line 106 expects exactly 2 calls, but the actual count depends on when the context cancellation is observed by the retry loop. If the retry logic checks the context after calling the operation but before incrementing, or if there's any scheduling delay, the count could vary.
Consider using
BeNumerically(">=", 2)orBeNumerically("<=", 3)to allow for timing variance, or restructure to make the test deterministic.
110-123: Consider increasing timeout for CI stability.The 50ms timeout may be too aggressive for CI environments under load. While the test logic is sound, a slightly longer timeout (e.g., 100-200ms) would reduce flakiness risk without significantly impacting test execution time.
validator/pkg/config/config.go (2)
94-112: Silent fallback on parse errors may hide configuration mistakes.The
getEnvBoolandgetEnvInthelpers silently return the default value when parsing fails. If an operator setsSTOP_ON_FIRST_FAILURE=yes(invalid bool) orREQUIRED_VCPUS=abc(invalid int), the config will silently use defaults instead of alerting to the misconfiguration.Consider logging a warning when parse errors occur, so operators are aware their configuration isn't being applied:
💡 Suggested improvement
+import "log/slog" + func getEnvBool(key string, defaultValue bool) bool { if value := os.Getenv(key); value != "" { b, err := strconv.ParseBool(value) if err == nil { return b } + slog.Warn("invalid boolean value for env var, using default", + "key", key, "value", value, "default", defaultValue) } return defaultValue } func getEnvInt(key string, defaultValue int) int { if value := os.Getenv(key); value != "" { i, err := strconv.Atoi(value) if err == nil { return i } + slog.Warn("invalid integer value for env var, using default", + "key", key, "value", value, "default", defaultValue) } return defaultValue }
54-61: Consider filtering empty strings from disabled validators.If
DISABLED_VALIDATORSis set to something like"api-enabled,,quota-check"(with an empty element), the resulting slice will contain an empty string. This could causeIsValidatorEnabled("")to returnfalsefor any validator name that somehow matches empty string (unlikely but defensive).💡 Defensive improvement
if disabled := os.Getenv("DISABLED_VALIDATORS"); disabled != "" { cfg.DisabledValidators = strings.Split(disabled, ",") // Trim whitespace - for i, v := range cfg.DisabledValidators { - cfg.DisabledValidators[i] = strings.TrimSpace(v) + var trimmed []string + for _, v := range cfg.DisabledValidators { + v = strings.TrimSpace(v) + if v != "" { + trimmed = append(trimmed, v) + } } + cfg.DisabledValidators = trimmed }validator/Dockerfile (2)
18-19: Runninggo mod tidyin the build may mask dependency issues.If
go.modandgo.sumare properly maintained in the repository,go mod tidyshould be unnecessary. Running it during the build can mask cases where the committed module files are out of sync with the source code. Consider removinggo mod tidyor adding it as a CI check instead.💡 Suggested change
-# Tidy and verify Go module dependencies -RUN go mod tidy && go mod verify +# Verify Go module dependencies +RUN go mod verify
15-19: Consider optimizing Docker layer caching for dependencies.Copying the entire source before downloading dependencies means every source change invalidates the dependency cache. For faster rebuilds, copy module files first:
💡 Optimized layer caching
WORKDIR /build -# Copy source code -COPY . . +# Copy and download dependencies first (cached layer) +COPY go.mod go.sum ./ +RUN go mod download && go mod verify -# Tidy and verify Go module dependencies -RUN go mod tidy && go mod verify +# Copy source code +COPY . . # Build binary using make to include version, commit, and build datevalidator/pkg/validator/resolver_test.go (1)
241-260: Consider whether silent handling of missing dependencies is the desired behavior.The test documents that missing dependencies are silently ignored, placing the validator at level 0. While this provides graceful degradation, it could mask configuration errors (e.g., typos in
RunAfternames). Consider adding a warning log in the resolver when a dependency reference cannot be resolved.charts/configs/validation-job-adapter-task.yaml (1)
74-75:JOB_NAMEduplicates the job name pattern - consider using a field reference or shared template.The job name pattern
gcp-validator-{{ .clusterId | lower }}-{{ .generationId }}is duplicated on lines 4 and 75. If one is updated without the other, they will drift. Consider defining a helper template or extracting to a variable.Note: Kubernetes doesn't provide a direct
fieldReffor the job name from within a pod, so templating is necessary. You could define the name pattern once in a YAML anchor or template variable:# At the top of the file or in a helper {{- $jobName := printf "gcp-validator-%s-%s" (.clusterId | lower) .generationId -}}validator/pkg/validator/registry.go (2)
23-31: Inconsistent duplicate handling between instance and globalRegister.
Registry.Register(line 24-31) silently overwrites duplicates, while the globalRegister(line 57-66) panics. This inconsistency could surprise developers. Consider either:
- Making both panic on duplicates
- Making the instance method return an error for explicit handling
- Documenting the difference clearly
💡 Option: Make instance Register consistent with global
// Register adds a validator to the registry -func (r *Registry) Register(v Validator) { +func (r *Registry) Register(v Validator) error { r.mu.Lock() defer r.mu.Unlock() meta := v.Metadata() - // Allow overwriting for testing purposes + if _, exists := r.validators[meta.Name]; exists { + return fmt.Errorf("validator already registered: %s", meta.Name) + } r.validators[meta.Name] = v + return nil }Also applies to: 57-66
78-83: Consider adding a build tag or panic guard for production safety.
ClearRegistryis marked for testing, but nothing prevents accidental use in production code. Consider either:
- Moving it to a
_test.gofile- Adding a build tag constraint
- Or accepting the risk if validator hot-reloading might be needed
validator/pkg/validator/executor_test.go (1)
123-154: Relax the timing assertion to avoid CI flakiness.The 100ms threshold can be tight on busy runners. Consider a slightly wider bound or a less time-sensitive parallelism check.
⏱️ Proposed adjustment
- Expect(duration).To(BeNumerically("<", 100*time.Millisecond)) + Expect(duration).To(BeNumerically("<", 200*time.Millisecond))validator/cmd/validator/main.go (1)
104-107: Avoid logging the full results payload at info level.The JSON can be large and may include sensitive details from validators. Prefer logging the path at info and emitting the payload only at debug.
🔧 Proposed change
- logger.Info("Results written successfully", - "path", outputFile, - "content", string(data)) + logger.Info("Results written successfully", "path", outputFile) + logger.Debug("Results content", "content", string(data))validator/pkg/validator/validator.go (1)
59-107: HandleStatusSkippedexplicitly in aggregation.
StatusSkippedis defined but not accounted for, which can yield confusing failures (no failed checks yet overall failure). Treat skipped as neutral and report it in details.🔧 Proposed aggregation tweak
checksRun := len(results) checksPassed := 0 + checksSkipped := 0 var failedChecks []string var failureDescriptions []string // Single pass to collect all failure information for _, r := range results { switch r.Status { case StatusSuccess: checksPassed++ case StatusFailure: failedChecks = append(failedChecks, r.ValidatorName) failureDescriptions = append(failureDescriptions, fmt.Sprintf("%s (%s)", r.ValidatorName, r.Reason)) + case StatusSkipped: + checksSkipped++ } } details := map[string]interface{}{ "checks_run": checksRun, "checks_passed": checksPassed, + "checks_skipped": checksSkipped, "timestamp": time.Now().UTC().Format(time.RFC3339), "validators": results, } - if checksPassed == checksRun { + if len(failedChecks) == 0 { + msg := "All GCP validation checks passed successfully" + if checksSkipped > 0 { + msg = fmt.Sprintf("All GCP validation checks passed successfully (%d skipped)", checksSkipped) + } return &AggregatedResult{ Status: StatusSuccess, Reason: "ValidationPassed", - Message: "All GCP validation checks passed successfully", + Message: msg, Details: details, } }validator/pkg/validator/executor.go (2)
93-128: Results slice indexing is safe but mutex scope can be tightened.The pre-allocated
resultsslice with indexed writes (line 145) is thread-safe because each goroutine writes to a unique index. However, the panic recovery block at lines 124-127 acquires the mutex but doesn't need to protectresults[index]since index isolation already handles that. Consider removing the redundant lock aroundresults[index]:♻️ Suggested refinement
// Thread-safe result storage e.mu.Lock() e.ctx.Results[meta.Name] = panicResult - results[index] = panicResult e.mu.Unlock() + results[index] = panicResult
140-145: Same observation for normal result storage.Similar to the panic path,
results[index]doesn't need mutex protection due to unique index per goroutine:♻️ Suggested refinement
// Thread-safe result storage e.mu.Lock() e.ctx.Results[meta.Name] = result e.mu.Unlock() results[index] = resultThis is already correctly structured - just confirming the mutex is only for
e.ctx.Resultsmap access.validator/pkg/validators/api_enabled.go (1)
64-94: Consider using the passed-in logger instead of slog.Default().Lines 65, 72, 76 use
slog.Default()andslog.Info/Error/Debugdirectly. For better testability and consistency, consider accepting a logger from the context or using the one configured at initialization.charts/configs/validation-adapter.yaml (1)
248-302: Complex CEL expressions may benefit from comments or documentation.The post-processing expressions (lines 258-298) use nested ternary logic with optional chaining (
.?and.orValue()). While functional, these are hard to read and maintain. Consider adding inline comments or extracting to named helper expressions if the adapter framework supports it.validator/pkg/gcp/client.go (3)
39-86: Comment mentions jitter but implementation lacks it.Line 46 comment says "exponential backoff with jitter" but the implementation only doubles the backoff without adding randomness. Jitter helps prevent thundering herd problems when multiple clients retry simultaneously.
♻️ Proposed fix to add jitter
+import ( + "math/rand" + // ... other imports +) + func retryWithBackoff(ctx context.Context, operation func() error) error { var lastErr error backoff := initialBackoff for attempt := 0; attempt < maxRetries; attempt++ { if attempt > 0 { - // Calculate exponential backoff with jitter + // Calculate exponential backoff if backoff < maxBackoff { backoff = backoff * 2 if backoff > maxBackoff { backoff = maxBackoff } } + // Add jitter: 75-125% of backoff + jitter := time.Duration(float64(backoff) * (0.75 + rand.Float64()*0.5)) slog.Debug("Retrying GCP API call", "attempt", attempt, "backoff", backoff) select { - case <-time.After(backoff): + case <-time.After(jitter): case <-ctx.Done():
67-77: Consider adding 502 Bad Gateway to retryable errors.The retryable status codes include 429, 500, and 503, but 502 (Bad Gateway) is also commonly transient in cloud environments. Additionally, using
errors.Asinstead of type assertion is more robust for wrapped errors.♻️ Proposed improvement
+const ( + // ... existing constants + statusBadGateway = 502 +) + // Check if error is retryable - if apiErr, ok := lastErr.(*googleapi.Error); ok { + var apiErr *googleapi.Error + if errors.As(lastErr, &apiErr) { // Retry on rate limit, service unavailable, and internal errors if apiErr.Code == statusRateLimited || + apiErr.Code == statusBadGateway || apiErr.Code == statusServiceUnavail || apiErr.Code == statusInternalError { continue }
88-100:projectIDfield is unused in ClientFactory.The
ClientFactorystoresprojectIDbut none of theCreate*Servicemethods use it. If it's intended for future use, consider adding a comment; otherwise, remove it to avoid confusion.♻️ Options
Either document the intent:
type ClientFactory struct { - projectID string + projectID string // Reserved for future per-project client configuration logger *slog.Logger }Or remove if unused:
type ClientFactory struct { - projectID string logger *slog.Logger } -func NewClientFactory(projectID string, logger *slog.Logger) *ClientFactory { +func NewClientFactory(logger *slog.Logger) *ClientFactory { return &ClientFactory{ - projectID: projectID, logger: logger, } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
validator/go.sumis excluded by!**/*.sum
📒 Files selected for processing (31)
charts/configs/validation-adapter.yamlcharts/configs/validation-job-adapter-task.yamlcharts/templates/_helpers.tplcharts/templates/configmap-app.yamlcharts/templates/deployment.yamlcharts/values.yamlvalidator/Dockerfilevalidator/Makefilevalidator/README.mdvalidator/cmd/validator/main.govalidator/go.modvalidator/pkg/config/config.govalidator/pkg/config/config_suite_test.govalidator/pkg/config/config_test.govalidator/pkg/gcp/client.govalidator/pkg/gcp/client_test.govalidator/pkg/gcp/gcp_suite_test.govalidator/pkg/validator/context.govalidator/pkg/validator/executor.govalidator/pkg/validator/executor_test.govalidator/pkg/validator/registry.govalidator/pkg/validator/registry_test.govalidator/pkg/validator/resolver.govalidator/pkg/validator/resolver_test.govalidator/pkg/validator/validator.govalidator/pkg/validator/validator_suite_test.govalidator/pkg/validators/api_enabled.govalidator/pkg/validators/api_enabled_test.govalidator/pkg/validators/quota_check.govalidator/pkg/validators/quota_check_test.govalidator/pkg/validators/validators_suite_test.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: 86254860
Repo: openshift-hyperfleet/adapter-validation-gcp PR: 4
File: charts/configs/fake-validation-gcp-job.yaml:25-25
Timestamp: 2025-12-18T11:32:59.422Z
Learning: In the adapter-validation-gcp repository, the gcp-validator service account referenced in fake-validation-gcp-job.yaml is created by the adapter runtime based on the gcpValidationServiceAccount resource definition in adapter-fake-validation-gcp.yaml, not by Helm templates.
📚 Learning: 2025-12-18T11:32:59.422Z
Learnt from: 86254860
Repo: openshift-hyperfleet/adapter-validation-gcp PR: 4
File: charts/configs/fake-validation-gcp-job.yaml:25-25
Timestamp: 2025-12-18T11:32:59.422Z
Learning: In the adapter-validation-gcp repository, the gcp-validator service account referenced in fake-validation-gcp-job.yaml is created by the adapter runtime based on the gcpValidationServiceAccount resource definition in adapter-fake-validation-gcp.yaml, not by Helm templates.
Applied to files:
charts/templates/configmap-app.yamlcharts/configs/validation-job-adapter-task.yamlcharts/templates/_helpers.tplcharts/values.yamlvalidator/README.mdcharts/configs/validation-adapter.yaml
📚 Learning: 2025-12-18T11:21:09.498Z
Learnt from: 86254860
Repo: openshift-hyperfleet/adapter-validation-gcp PR: 4
File: charts/configs/broker-pubsub.yaml:13-13
Timestamp: 2025-12-18T11:21:09.498Z
Learning: In the charts/configs/broker-pubsub.yaml file for the adapter-validation-gcp project, the broker configuration settings like ack_deadline_seconds should remain as fixed values to align with other adapter settings at this phase, rather than being made configurable.
Applied to files:
charts/templates/_helpers.tplcharts/configs/validation-adapter.yaml
🧬 Code graph analysis (15)
validator/pkg/gcp/client_test.go (1)
validator/pkg/gcp/client.go (3)
GetDefaultClientForTesting(220-222)RetryWithBackoffForTesting(225-227)NewClientFactory(95-100)
validator/pkg/config/config_test.go (1)
validator/pkg/config/config.go (2)
LoadFromEnv(40-85)Config(11-37)
validator/pkg/validators/quota_check_test.go (4)
validator/pkg/validators/quota_check.go (1)
QuotaCheckValidator(12-12)validator/pkg/validator/context.go (1)
Context(14-30)validator/pkg/config/config.go (2)
LoadFromEnv(40-85)Config(11-37)validator/pkg/validator/validator.go (3)
Result(41-49)Status(32-32)StatusSuccess(35-35)
validator/pkg/validators/api_enabled_test.go (3)
validator/pkg/validators/api_enabled.go (1)
APIEnabledValidator(45-45)validator/pkg/validator/context.go (1)
Context(14-30)validator/pkg/config/config.go (2)
LoadFromEnv(40-85)Config(11-37)
validator/pkg/validator/context.go (2)
validator/pkg/config/config.go (1)
Config(11-37)validator/pkg/validator/validator.go (1)
Result(41-49)
validator/pkg/validator/resolver_test.go (4)
validator/pkg/validator/resolver.go (2)
DependencyResolver(15-17)NewDependencyResolver(20-27)validator/pkg/validator/validator.go (1)
Validator(20-29)validator/pkg/validator/context.go (1)
Context(14-30)validator/pkg/validator/registry_test.go (1)
MockValidator(13-20)
validator/pkg/validators/quota_check.go (4)
validator/pkg/validator/registry.go (1)
Register(57-66)validator/pkg/validator/validator.go (4)
ValidatorMetadata(12-17)Result(41-49)Status(32-32)StatusSuccess(35-35)validator/pkg/validator/context.go (1)
Context(14-30)validator/pkg/config/config.go (1)
Config(11-37)
validator/pkg/validator/registry_test.go (3)
validator/pkg/validator/context.go (1)
Context(14-30)validator/pkg/validator/validator.go (4)
Result(41-49)ValidatorMetadata(12-17)Status(32-32)StatusSuccess(35-35)validator/pkg/validator/registry.go (5)
Registry(11-14)NewRegistry(17-21)Register(57-66)GetAll(69-71)Get(74-76)
validator/pkg/validator/validator.go (1)
validator/pkg/validator/context.go (1)
Context(14-30)
validator/pkg/validator/executor.go (5)
validator/pkg/validator/context.go (1)
Context(14-30)validator/pkg/validator/validator.go (5)
Result(41-49)Validator(20-29)Status(32-32)StatusFailure(36-36)StatusSkipped(37-37)validator/pkg/validator/registry.go (1)
GetAll(69-71)validator/pkg/validator/resolver.go (2)
NewDependencyResolver(20-27)ExecutionGroup(9-12)validator/pkg/config/config.go (1)
Config(11-37)
validator/pkg/validator/executor_test.go (6)
validator/pkg/validator/context.go (1)
Context(14-30)validator/pkg/validator/executor.go (2)
Executor(13-17)NewExecutor(20-25)validator/pkg/validator/registry.go (2)
ClearRegistry(79-83)Register(57-66)validator/pkg/config/config.go (2)
LoadFromEnv(40-85)Config(11-37)validator/pkg/validator/validator.go (4)
Result(41-49)Status(32-32)StatusSuccess(35-35)StatusFailure(36-36)validator/pkg/validator/registry_test.go (1)
MockValidator(13-20)
validator/pkg/validator/resolver.go (1)
validator/pkg/validator/validator.go (1)
Validator(20-29)
validator/pkg/validator/registry.go (1)
validator/pkg/validator/validator.go (1)
Validator(20-29)
validator/pkg/validators/api_enabled.go (5)
validator/pkg/validator/registry.go (2)
Register(57-66)Get(74-76)validator/pkg/validator/validator.go (5)
ValidatorMetadata(12-17)Result(41-49)Status(32-32)StatusFailure(36-36)StatusSuccess(35-35)validator/pkg/validator/context.go (1)
Context(14-30)validator/pkg/config/config.go (1)
Config(11-37)validator/pkg/gcp/client.go (1)
NewClientFactory(95-100)
validator/pkg/gcp/client.go (1)
validator/pkg/validator/context.go (1)
Context(14-30)
🪛 checkmake (0.2.2)
validator/Makefile
[warning] 90-90: Target body for "lint" exceeds allowed length of 5 (8).
(maxbodylength)
[warning] 101-101: Target body for "fmt" exceeds allowed length of 5 (6).
(maxbodylength)
[warning] 185-185: Missing required phony target "all"
(minphony)
🪛 OSV Scanner (2.3.1)
validator/go.mod
[HIGH] 1-1: golang.org/x/oauth2 0.15.0: Unexpected memory consumption during token parsing in golang.org/x/oauth2
(GO-2025-3488)
[HIGH] 1-1: golang.org/x/oauth2 0.15.0: golang.org/x/oauth2 Improper Validation of Syntactic Correctness of Input vulnerability
🔇 Additional comments (50)
validator/go.mod (1)
3-3: No action needed. Go 1.25.0 is a released, stable version (released August 12, 2025) and is appropriate for this project.validator/pkg/gcp/gcp_suite_test.go (1)
10-12: LGTM: test suite bootstrap is standard.No issues spotted in the suite entrypoint.
validator/pkg/validators/validators_suite_test.go (1)
10-12: LGTM: test suite bootstrap is standard.No concerns here.
charts/templates/_helpers.tpl (1)
81-85: Adapter config filename change verified and is correct.The file
validation-adapter.yamlexists incharts/configs/and is properly referenced in the template helpers. No issues.validator/pkg/config/config_suite_test.go (1)
1-13: LGTM!Standard Ginkgo v2 test bootstrap file. The use of an external test package (
config_test) is a good practice for testing the public API.validator/pkg/validator/validator_suite_test.go (1)
1-13: LGTM!Standard Ginkgo v2 test bootstrap file, consistent with the other test suite bootstraps in this PR.
validator/pkg/config/config_test.go (1)
42-245: Comprehensive test coverage.The test suite provides excellent coverage of the config loading functionality:
- Default values and required field validation
- Custom configuration overrides
- List parsing with whitespace handling
- Type conversion error handling (integers, booleans)
- Validator enablement logic
The tests are well-organized using nested
Contextblocks for clear separation of concerns.validator/pkg/validators/quota_check_test.go (1)
1-103: LGTM! Well-structured test suite for the stub validator.The test coverage is comprehensive for a stub implementation, covering metadata validation, enablement logic, and the expected stub behavior. Environment variable cleanup is properly handled in the nested
AfterEachblocks.validator/pkg/gcp/client_test.go (1)
69-88: Good use of table-driven tests for retry behavior.The
DescribeTablepattern effectively covers the different HTTP error codes and their expected retry behavior. This makes the test intent clear and easy to extend.validator/pkg/validators/api_enabled_test.go (3)
34-37: Good defensive cleanup of environment variables.The
AfterEachblock properly cleans up bothPROJECT_IDandREQUIRED_APIS, ensuring no environment leakage between test runs regardless of which inner context was executed.
83-123: Good configuration testing coverage.The tests effectively verify the config loading behavior including default APIs, custom API overrides, and whitespace trimming. This ensures the
LoadFromEnvfunction handles edge cases properly.
144-148: Appropriate deferral of integration tests.The comment correctly identifies that testing
Validate()requires either real GCP credentials or mocked clients. This is a reasonable boundary for unit tests.charts/values.yaml (2)
6-9: Verify production readiness for default "real" mode.The default deployment mode changed from "dummy" to "real". Ensure that:
- The
gcp-validatorimage at the specified registry is available and tested- Downstream deployments are aware of this default change
- CI/CD pipelines are updated if they relied on "dummy" mode
90-103: Well-structured real validation configuration.The configuration block is well-documented with sensible defaults. The
requiredApisdefault matches the defaults inconfig.go, maintaining consistency between Helm values and Go code.validator/pkg/config/config.go (1)
39-85: Clean configuration loading implementation.The
LoadFromEnvfunction is well-structured with clear separation between field initialization, list parsing, and validation. The default API list is sensible and matches the Helm chart defaults.charts/configs/validation-job-adapter-task.yaml (1)
23-29: LGTM - Pod spec is well-structured.The pod configuration correctly uses:
- Service account from the adapter runtime (per learnings, created externally)
restartPolicy: NeverwithbackoffLimit: 0for single-attempt semantics- Shared
emptyDirvolume for inter-container result communicationvalidator/pkg/validator/registry.go (1)
33-43: LGTM -GetAllcorrectly returns an unordered snapshot.The method properly acquires a read lock and returns a fresh slice, avoiding external mutation of the internal map. The non-deterministic iteration order is acceptable since consumers (like
DependencyResolver) handle ordering as needed.validator/pkg/validator/context.go (1)
14-30: Remove suggestion for Results map initialization constructor.The
Resultsmap is properly initialized in all code paths whereContextis instantiated (cmd/validator/main.go,executor_test.go,api_enabled_test.go,quota_check_test.go). Every instantiation explicitly callsmake(map[string]*validator.Result), preventing any nil map panic. The current pattern is correct and requires no changes.validator/pkg/validator/resolver_test.go (1)
1-8: LGTM - comprehensive test coverage for dependency resolution.The test suite covers essential scenarios including no dependencies, linear chains, parallel execution grouping, complex DAGs, circular dependency detection, self-references, missing dependencies, and empty input. The BDD structure with Ginkgo is clear and well-organized.
MockValidatoris properly defined inregistry_test.goand accessible within the sharedvalidator_testpackage.validator/Dockerfile (1)
4-4: > Likely an incorrect or invalid review comment.validator/pkg/validator/executor_test.go (6)
49-58: Good coverage for the empty-registry error path.
60-102: Solid single-validator coverage (execution + context + timing metadata).
104-120: Disabled validator behavior is clearly validated.
164-217: Dependency ordering test is crisp and valuable.
219-256: Stop-on-first-failure behavior is well exercised.
258-285: Failure propagation checks look good.validator/pkg/validator/registry_test.go (5)
12-45: MockValidator provides flexible test hooks.
72-89: Register coverage for single and multiple validators looks solid.
107-122: GetAll coverage is straightforward and sufficient.
125-141: Get-by-name tests are clear and complete.
91-103: The test correctly reflects the instance method's intended behavior. TheRegistry.Register()method explicitly allows overwriting validators with duplicate names, as documented in the code comment: "Allow overwriting for testing purposes". This differs intentionally from the globalRegister()function, which panics on duplicates to protect against accidental re-registration in initialization code. The test is correct and needs no changes.validator/cmd/validator/main.go (1)
122-135: Log-level parsing is clean and idiomatic.validator/pkg/validators/quota_check.go (1)
14-29: Metadata/enablement wiring looks good.validator/pkg/validator/validator.go (1)
10-57: Core validator types and result models are clean and well-scoped.validator/pkg/validator/executor.go (2)
12-25: LGTM - Well-structured executor with thread-safe design.The
Executorstruct correctly uses a mutex to protect shared state during parallel execution, and the constructor is straightforward.
27-86: LGTM - Execution orchestration is well-designed.The
ExecuteAllmethod correctly:
- Filters enabled validators before execution
- Builds a dependency-resolved execution plan
- Executes groups sequentially while allowing parallelism within groups
- Handles stop-on-first-failure semantics
The logging at each stage provides good observability.
validator/pkg/validators/api_enabled.go (4)
15-20: LGTM - Timeout constants are well-defined.The 2-minute overall timeout and 30-second per-request timeout provide reasonable defaults for GCP API calls.
22-42: LGTM - Error extraction logic is robust.The
extractErrorReasonfunction correctly useserrors.Asfor proper error unwrapping and provides meaningful fallback reasons.
101-143: Good use of per-request context timeout and immediate cancel.The pattern of creating a per-request timeout context and calling
reqCancel()immediately after the API call (line 109) is correct and prevents context leaks.
145-171: LGTM - Result construction is comprehensive.Success and failure results include all relevant details (enabled/disabled APIs, project ID, actionable hints), which aids debugging and user experience.
validator/pkg/validator/resolver.go (4)
8-27: LGTM - Clean data structures and constructor.
ExecutionGroupandDependencyResolverare well-defined. The constructor properly indexes validators by name.
29-66: LGTM - Execution group resolution is correct.The method properly:
- Detects cycles first
- Assigns levels via topological ordering
- Groups validators by level with deterministic alphabetical sorting
106-148: Cycle detection algorithm is correct for standard cycles.The DFS-based cycle detection using
visitedandrecStack(recursion stack) correctly identifies back-edges in the dependency graph. The algorithm properly:
- Marks nodes as visited permanently after full exploration
- Uses
recStackto track nodes in the current DFS path- Skips non-existent dependencies gracefully
68-104: Remove this comment—self-references are properly caught bydetectCycles.The
detectCyclesfunction uses a recursion stack to detect all cycles, including self-references. When a validator references itself,detectCyclessetsvisited[name]andrecStack[name]to true before processing dependencies (lines 113–114). During the iteration throughRunAfter, the self-reference check at line 130 (else if recStack[dep]) catches it and returns an error immediately. A test case inresolver_test.go("with self-referencing dependency") explicitly verifies this behavior and confirms the error is returned. SincedetectCyclesruns first inResolveExecutionGroups(line 33) beforeassignLevels(line 38), infinite recursion inassignLevelscannot occur with self-references.charts/configs/validation-adapter.yaml (3)
131-136: Precondition usesvalues(plural) withequalsoperator.Line 136 uses
values: "Active"with operatorequals. If the adapter framework expectsvalue(singular) for equality checks vsvalues(list) forinoperator, this may cause issues. Verify the expected schema.
183-194: RBAC permissions follow least-privilege principle - LGTM.The Role grants minimal permissions:
batch/jobs: get onlybatch/jobs/status: get, update, patch (needed for status reporter)pods: get, list (for status checking)This is appropriate for a status-reporter role.
1-14: LGTM - Well-documented adapter configuration header.The metadata and labels properly identify the adapter type, component, and provider.
validator/pkg/gcp/client.go (3)
20-30: LGTM - Retry constants are reasonable.Initial backoff of 100ms, max of 30s, and 5 retries provide good defaults for GCP API resilience.
102-123: LGTM - Compute service creation follows good patterns.Uses read-only scope appropriately for validation tasks and wraps creation with retry logic.
217-227: Test helpers are appropriately exposed.Exporting
GetDefaultClientForTestingandRetryWithBackoffForTestingallows unit tests to exercise internal logic without modifying production code paths.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
charts/configs/validation-dummy-adapter.yaml (1)
67-71: Copy-paste error in parameter description.The
gcpValidatorServiceAccountparameter has an incorrect description that refers to "Maximum time to wait for validation completion" instead of describing the service account.📝 Proposed fix
- name: "gcpValidatorServiceAccount" source: "env.GCP_VALIDATOR_SERVICE_ACCOUNT" type: "string" default: "gcp-validator-job-sa" - description: "Maximum time to wait for validation completion" + description: "Kubernetes ServiceAccount name for the validator job"
🤖 Fix all issues with AI agents
In `@charts/configs/validation-adapter.yaml`:
- Around line 265-269: The message expression uses awkward grammar; update the
ternary message inside the expression that references resources.gcpValidationJob
so the false branch reads either "Validation job is pending to be applied" or
"Validation job pending application" (e.g., replace "Validation job is pending
to applied" with one of these corrected phrases) to fix the grammar in the
message field.
♻️ Duplicate comments (3)
validator/pkg/validator/executor.go (1)
137-154: LGTM! Nil result handling properly implemented.The defensive nil check addresses the previously identified concern. The fallback result includes appropriate error details and timing information.
validator/pkg/validators/api_enabled.go (1)
100-147: EmptyrequiredAPIsreturns success without validation.When
vctx.Config.RequiredAPIsis empty, the loop is skipped and the validator returns success with "No required APIs to validate". This could mask configuration issues. Consider returningStatusSkippedinstead to clearly indicate no validation occurred.♻️ Suggested change
// Check each required API requiredAPIs := vctx.Config.RequiredAPIs + if len(requiredAPIs) == 0 { + slog.Warn("No required APIs configured, skipping validation") + return &validator.Result{ + Status: validator.StatusSkipped, + Reason: "NoAPIsConfigured", + Message: "No required APIs configured for validation", + Details: map[string]interface{}{ + "project_id": vctx.Config.ProjectID, + }, + } + } enabledAPIs := []string{} disabledAPIs := []string{}validator/pkg/validators/quota_check.go (1)
78-90: Stub validator should returnStatusSkipped, notStatusSuccess.Returning success for an unimplemented check can falsely signal "all checks passed" in aggregated results. The validator correctly includes
stub: truein details, but consumers may only check the status field.🔧 Proposed fix
slog.Warn("Quota check not yet implemented - returning success by default") return &validator.Result{ - Status: validator.StatusSuccess, - Reason: "QuotaCheckStub", - Message: "Quota check validation not yet implemented (stub returning success)", + Status: validator.StatusSkipped, + Reason: "QuotaCheckNotImplemented", + Message: "Quota check not implemented; skipped", Details: map[string]interface{}{ "stub": true, "implemented": false, "project_id": vctx.Config.ProjectID, "note": "This validator needs to be implemented to check actual GCP quotas", }, }
🧹 Nitpick comments (3)
charts/configs/validation-adapter.yaml (1)
47-57: Default image values differ from values.yaml.The default images here use
quay.io/rh-ee-dawang/...whilevalues.yamlusesregistry.ci.openshift.org/ci/.... While environment variables will override these defaults at runtime, consider aligning the defaults for consistency and to reduce confusion during debugging.📝 Proposed alignment with values.yaml
- name: "statusReporterImage" source: "env.STATUS_REPORTER_IMAGE" type: "string" - default: "quay.io/rh-ee-dawang/status-reporter:dev-04e8d0a" + default: "registry.ci.openshift.org/ci/status-reporter:latest" description: "Container image for the status reporter" # GCP Validator configuration - name: "gcpValidatorImage" source: "env.GCP_VALIDATOR_IMAGE" type: "string" - default: "quay.io/rh-ee-dawang/gcp-validator:latest" + default: "registry.ci.openshift.org/ci/gcp-validator:latest" description: "GCP validator container image"validator/cmd/validator/main.go (2)
107-110: Consider truncating or omitting full results content in logs.Logging the entire JSON results content (line 110) could produce very large log entries in production with many validators. Consider logging only a summary or making this configurable via log level.
♻️ Suggested change
- // Log the results content for easy access via logs (useful in containerized environments) - logger.Info("Results written successfully", - "path", outputFile, - "content", string(data)) + // Log results summary; full content available in output file + logger.Info("Results written successfully", + "path", outputFile, + "size_bytes", len(data)) + logger.Debug("Results content", "content", string(data))
100-105: Ensure output directory exists before writing.The comment notes that
/resultsshould be pre-created via volumeMounts in Kubernetes. However, if running outside Kubernetes or if the mount fails,os.WriteFilewill fail silently on missing directory. Consider explicitly creating the directory.♻️ Suggested change
+ import "path/filepath" + // Ensure output directory exists - // Note: In Kubernetes, the /results directory should be pre-created via volumeMounts + if dir := filepath.Dir(outputFile); dir != "" && dir != "." { + if err := os.MkdirAll(dir, 0755); err != nil { + logger.Error("Failed to create output directory", "error", err, "path", dir) + os.Exit(1) + } + } if err := os.WriteFile(outputFile, data, 0644); err != nil {
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
validator/go.sumis excluded by!**/*.sum
📒 Files selected for processing (10)
charts/configs/validation-adapter.yamlcharts/configs/validation-dummy-adapter.yamlcharts/values.yamlvalidator/cmd/validator/main.govalidator/go.modvalidator/pkg/config/config.govalidator/pkg/validator/executor.govalidator/pkg/validator/executor_test.govalidator/pkg/validators/api_enabled.govalidator/pkg/validators/quota_check.go
✅ Files skipped from review due to trivial changes (1)
- validator/go.mod
🚧 Files skipped from review as they are similar to previous changes (2)
- validator/pkg/config/config.go
- validator/pkg/validator/executor_test.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: 86254860
Repo: openshift-hyperfleet/adapter-validation-gcp PR: 4
File: charts/configs/fake-validation-gcp-job.yaml:25-25
Timestamp: 2025-12-18T11:32:59.422Z
Learning: In the adapter-validation-gcp repository, the gcp-validator service account referenced in fake-validation-gcp-job.yaml is created by the adapter runtime based on the gcpValidationServiceAccount resource definition in adapter-fake-validation-gcp.yaml, not by Helm templates.
📚 Learning: 2025-12-18T11:32:59.422Z
Learnt from: 86254860
Repo: openshift-hyperfleet/adapter-validation-gcp PR: 4
File: charts/configs/fake-validation-gcp-job.yaml:25-25
Timestamp: 2025-12-18T11:32:59.422Z
Learning: In the adapter-validation-gcp repository, the gcp-validator service account referenced in fake-validation-gcp-job.yaml is created by the adapter runtime based on the gcpValidationServiceAccount resource definition in adapter-fake-validation-gcp.yaml, not by Helm templates.
Applied to files:
charts/configs/validation-dummy-adapter.yamlcharts/configs/validation-adapter.yamlcharts/values.yaml
📚 Learning: 2025-12-18T11:21:09.498Z
Learnt from: 86254860
Repo: openshift-hyperfleet/adapter-validation-gcp PR: 4
File: charts/configs/broker-pubsub.yaml:13-13
Timestamp: 2025-12-18T11:21:09.498Z
Learning: In the charts/configs/broker-pubsub.yaml file for the adapter-validation-gcp project, the broker configuration settings like ack_deadline_seconds should remain as fixed values to align with other adapter settings at this phase, rather than being made configurable.
Applied to files:
charts/configs/validation-adapter.yaml
🧬 Code graph analysis (1)
validator/pkg/validators/api_enabled.go (5)
validator/pkg/validator/registry.go (2)
Register(57-66)Get(74-76)validator/pkg/validator/validator.go (5)
ValidatorMetadata(12-17)Result(41-49)Status(32-32)StatusFailure(36-36)StatusSuccess(35-35)validator/pkg/validator/context.go (1)
Context(14-30)validator/pkg/config/config.go (1)
Config(11-37)validator/pkg/gcp/client.go (1)
NewClientFactory(95-100)
🔇 Additional comments (20)
charts/configs/validation-dummy-adapter.yaml (1)
72-73: LGTM on the casing fix.The environment variable source is now correctly using
MANAGED_BY_RESOURCE_NAMEwith consistent uppercase casing.charts/values.yaml (2)
6-9: Clear deployment mode documentation.The comments explaining the two deployment modes (dummy vs real) are helpful for operators. The default to "real" is appropriate for production use.
90-103: Real GCP validator configuration looks complete.The configuration includes all necessary settings for the real validator:
- Container image reference
- Disabled validators list (quota-check stub appropriately disabled)
- Required GCP APIs with sensible defaults
- Results path and timeout settings
The structure mirrors the dummy config for consistency.
charts/configs/validation-adapter.yaml (6)
1-27: Well-structured AdapterConfig metadata and base configuration.The metadata correctly identifies this as a validation adapter for GCP with appropriate labels. The HyperFleet API and Kubernetes configuration sections are properly defined.
83-87: Environment variable casing is correct.The
managedByResourceNameparameter correctly usesenv.MANAGED_BY_RESOURCE_NAMEwith proper uppercase casing.
97-122: Preconditions properly validate cluster state and GCP project.The
clusterStatusprecondition appropriately:
- Captures the
projectIdfrom the cluster spec- Validates that
clusterPhaseis in an acceptable state- Ensures
projectIdexists before proceedingThis is essential for the real GCP validation flow.
140-164: ServiceAccount resource definition is correct.The ServiceAccount is properly templated with cluster-specific namespace and labels. Based on learnings, the service account is created by the adapter runtime based on this resource definition, not by Helm templates.
183-194: RBAC permissions are appropriately scoped.The Role grants minimal permissions needed:
geton jobs,get/update/patchon jobs/status for the status reporterget/liston pods for monitoringThis follows the principle of least privilege.
248-302: Post-processing payload expressions are well-designed.The CEL expressions use proper null-safety with
orValue()and handle multiple status conditions (Applied, Available, Health). The conditional logic correctly handles various job states.validator/pkg/validator/executor.go (3)
89-187: Thread-safe result storage is correctly implemented.The mutex properly guards access to
e.ctx.Resultsmap. Theresults[index]slice assignment without mutex is safe because each goroutine writes to a distinct index, eliminating data races on the slice.
74-82: Potential race:StopOnFirstFailurecheck reads results that may still be written.The check for failure status in
groupResultsoccurs afterwg.Wait(), so this is safe within a single group. However, consider that if a validator in the current group fails, other validators in the same group continue running to completion before the check triggers. This is likely the intended behavior for parallel execution, but worth documenting.
98-129: Robust panic recovery with comprehensive error details.The panic handling correctly captures stack trace, stores the result thread-safely, and logs detailed information for debugging. Well implemented.
validator/pkg/validators/api_enabled.go (3)
22-42: Well-structured error reason extraction.The
extractErrorReasonhelper properly prioritizes GCP-specific error reasons over generic HTTP codes, with appropriate nil handling and fallback.
105-113: Correct per-request timeout pattern.The immediate
reqCancel()call afterDo()properly releases resources. This is the idiomatic pattern for per-request timeouts within a loop.
140-146: Code comparison is correct as-is.The string comparison
service.State == "ENABLED"is correct for thegoogle.golang.org/api/serviceusage/v1REST-based client being used. The GCP ServiceUsage API returns state as a string field with values like "ENABLED", "DISABLED", and "STATE_UNSPECIFIED". Constants are not applicable here—only the newer protobuf-basedcloud.google.com/go/serviceusage/apiv1library provides enum constants. No changes needed.validator/pkg/validators/quota_check.go (1)
19-27: Metadata correctly identifies this as a stub with appropriate tags.The description clearly states it's a stub, and the "stub" tag enables filtering. The dependency on "api-enabled" is appropriate for when real quota checks are implemented.
validator/cmd/validator/main.go (4)
1-16: Clean imports with side-effect registration.The blank import of
validator/pkg/validatorscorrectly triggersinit()functions to register validators. This is an idiomatic Go pattern for plugin-style registration.
47-57: Good validation of disabled validators with helpful hint.Warning on unknown validators helps catch configuration typos early. The hint to run without
DISABLED_VALIDATORSis actionable.
69-76: Proper signal handling for graceful shutdown.The buffered channel with capacity 1 correctly handles the race between signal delivery and receive. Canceling the context propagates shutdown through the validator execution.
125-139: Robust log level parsing with sensible default.The
parseLogLevelhelper correctly handles common variations ("warn"/"warning") and defaults toInfofor unknown values.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| - type: "Available" | ||
| status: | ||
| expression: | | ||
| resources.?gcpValidationJob.?status.?conditions.orValue([]).exists(c, c.type == "Available") | ||
| ? resources.gcpValidationJob.status.conditions.filter(c, c.type == "Available")[0].status : "False" |
There was a problem hiding this comment.
Could we have a third value Unknown if there is no resources.gcpValidationJob.status yet ?
There was a problem hiding this comment.
According to openshift-hyperfleet/architecture#79, it requires several changes from different CLM components. I'll update this logic in another PR.
validator/pkg/validator/executor.go
Outdated
| return nil, fmt.Errorf("no validators enabled") | ||
| } | ||
|
|
||
| e.logger.Info("Found enabled validators", "count", len(enabledValidators)) |
There was a problem hiding this comment.
This seems to be more of a debug info.
There are more Info later that may be too detailed.
My concern? generating logs that are too noisy with values that are always the same. Once it is configured, these depdendencies will be always the same for all jobs executions, so having this info may be only needed in case we are debugging an incident and try to find out values at every step
There was a problem hiding this comment.
I'll update in another PR about logging standard if required. Not in current PR.
|
|
||
| if service.State == "ENABLED" { | ||
| enabledAPIs = append(enabledAPIs, apiName) | ||
| slog.Debug("API is enabled", "api", apiName) |
There was a problem hiding this comment.
Now that I notice the slog here I wonder if we should pass the trace_id from the adapter to the adapter tasks and keep the traceability. This should be passed by the adapter framework
There was a problem hiding this comment.
I'll update in another PR about logging standard (e.g., trace_id) if required. Not in current PR.
validator/cmd/validator/main.go
Outdated
| })) | ||
| slog.SetDefault(logger) | ||
|
|
||
| logger.Info("Starting GCP Validator") |
There was a problem hiding this comment.
We should also log the ldflags info, version, commit, buildDate
There was a problem hiding this comment.
I'll update in another PR about logging standard (e.g., ldflags info, version, commit, buildDate) if required. Not in current PR.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@charts/templates/deployment.yaml`:
- Around line 102-103: The template defines LOG_LEVEL twice (adapter uses
.Values.logging.level and validation uses .Values.validation.real.logLevel when
useDummy=false), causing the second to override the first; change the validation
env var name from LOG_LEVEL to VALIDATOR_LOG_LEVEL in the deployment template
and update any consumer of the validation config to read VALIDATOR_LOG_LEVEL
instead of LOG_LEVEL (keep the adapter env var tied to .Values.logging.level so
it remains unchanged). Ensure the values mapping
(.Values.validation.real.logLevel) is preserved and any docs or config
references are updated to reflect VALIDATOR_LOG_LEVEL.
♻️ Duplicate comments (2)
charts/configs/validation-dummy-adapter.yaml (1)
240-244: Minor grammar issue in message expression.Same grammar issue as in
validation-adapter.yaml: "pending to applied" should be "pending" or "pending to be applied".📝 Proposed fix
message: expression: | has(resources.gcpValidationJob) ? "Validation job applied successfully" - : "Validation job is pending to applied" + : "Validation job is pending"charts/configs/validation-adapter.yaml (1)
262-266: Minor grammar issue in message expression."pending to applied" should be "pending" or "pending to be applied".
📝 Proposed fix
message: expression: | has(resources.gcpValidationJob) ? "Validation job applied successfully" - : "Validation job is pending to applied" + : "Validation job is pending"
🧹 Nitpick comments (1)
Makefile (1)
84-84: Consider addinghelm-template-realto the test target.The
helm-testtarget validates dummy mode template rendering but not real mode. Addinghelm-template-realwould ensure both configuration paths are tested duringmake helm-test.Proposed change
-helm-test: helm-lint helm-template-broker helm-template-rabbitmq helm-template-dummy ## Run all helm chart tests (lint + template rendering) +helm-test: helm-lint helm-template-broker helm-template-rabbitmq helm-template-dummy helm-template-real ## Run all helm chart tests (lint + template rendering)
…dapter environment variables
125091c to
cbda303
Compare
… support Mermaid flowchart showing dependency
1bd2e53 to
55d5c43
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@validator/Makefile`:
- Around line 190-198: The current image-dev target uses the `ifndef QUAY_USER`
guard which misses empty or whitespace-only QUAY_USER (since QUAY_USER is set
with `?=`), so replace that guard with a whitespace-aware check using `ifeq
($(strip $(QUAY_USER)),)` to detect empty values; update the block guarding the
image-dev target (the section referencing QUAY_USER, PROJECT_NAME and DEV_TAG)
to use `ifeq ($(strip $(QUAY_USER)),)` ... `endif` and keep the same error
messages and exit behavior so the makefile aborts when QUAY_USER is empty or
only whitespace.
In `@validator/pkg/validator/context.go`:
- Around line 56-65: The GetComputeService lazy-init is not thread-safe:
concurrent calls to Context.GetComputeService may race on c.computeService and
create duplicates; change the implementation to guard initialization with
synchronization (either add a mutex field on Context and wrap the
nil-check/assignment for c.computeService in a critical section, or use a
sync.Once per service if using the first-call context for all callers is
acceptable). Specifically protect access to c.computeService around the call to
c.clientFactory.CreateComputeService so only one goroutine performs creation and
assigns c.computeService; update the Context struct to include the chosen sync
primitive and apply the same pattern to other lazy getters.
In `@validator/pkg/validator/resolver.go`:
- Around line 150-187: The ToMermaid method on DependencyResolver marks a node
as having dependencies whenever meta.RunAfter is non-empty, which hides
validators whose RunAfter entries are missing from r.validators; change the
logic in ToMermaid so that hasDependencies[name] is set only when an actual edge
is emitted (i.e., inside the loop over meta.RunAfter when an existing dependency
is found), ensuring validators with only-missing deps remain in allValidators
and are rendered as standalone nodes; reference the DependencyResolver.ToMermaid
function, r.validators, Metadata()/RunAfter, hasDependencies and allValidators
when making the change.
In `@validator/test/integration/validator_integration_test.go`:
- Around line 271-290: The test currently sets cfg.MaxWaitTimeSeconds but then
enforces timeout via context.WithTimeout (shortCtx), so it doesn't validate
configuration-driven timeouts; either rename the spec to reflect that it tests
context-based timeouts, or change the test to exercise the executor's internal
timeout logic by creating an executor that reads cfg.MaxWaitTimeSeconds (ensure
validator.NewExecutor or Executor.ExecuteAll uses cfg.MaxWaitTimeSeconds to
derive its own context/cancellation) and call ExecuteAll without wrapping the
call in context.WithTimeout, then assert that the returned error contains
"context" or that results are nil when the configured timeout elapses; update
references to MaxWaitTimeSeconds, NewExecutor, and ExecuteAll accordingly.
♻️ Duplicate comments (1)
validator/cmd/validator/main.go (1)
37-42: Consider logging build information (ldflags).A previous reviewer requested logging of build information (
version,commit,buildDate) at startup. This helps with debugging and traceability in production environments.♻️ Proposed addition for build info logging
Add ldflags variables and log them at startup:
// At package level var ( version = "dev" commit = "unknown" buildDate = "unknown" ) // In main(), after logger setup: logger.Info("Starting GCP Validator", "version", version, "commit", commit, "build_date", buildDate)Then update your build command to inject these values:
go build -ldflags "-X main.version=v1.0.0 -X main.commit=$(git rev-parse HEAD) -X main.buildDate=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
🧹 Nitpick comments (5)
validator/pkg/validator/executor_test.go (1)
159-212: Dependency ordering test is correct, but consider adding out-of-order registration test.The test correctly verifies that
validator-aexecutes beforevalidator-bandvalidator-cusing thread-safe order tracking. The use ofConsistOfappropriately handles the unordered nature of parallel execution within the same level.However, per a previous reviewer's question: this test registers validators in dependency order (a, then b/c). Consider adding a test case where dependent validators are registered before their dependencies (e.g., register b/c first, then a) to verify the resolver handles registration order independence.
validator/cmd/validator/main.go (1)
103-106: Consider log level for results content.Logging the full JSON results content at
Infolevel could be verbose in production. Consider usingDebuglevel for the full content, or only logging a summary atInfolevel.♻️ Suggested change
- logger.Info("Results written successfully", - "path", outputFile, - "content", string(data)) + logger.Info("Results written successfully", "path", outputFile) + logger.Debug("Results content", "content", string(data))validator/test/integration/validator_integration_test.go (2)
134-136: Stub assumption may cause future test failure.This test assumes the
quota-checkvalidator (currently a stub) always returnsStatusSuccess. When the actual quota validation is implemented, this test will need to be updated or will fail unexpectedly.♻️ Suggested improvement
- // Currently a stub, so should succeed - Expect(result.Status).To(Equal(validator.StatusSuccess)) + // Note: quota-check is currently a stub that always succeeds. + // Update this assertion when HYPERFLEET-510 implements real quota validation. + Expect(result.Status).To(BeElementOf( + validator.StatusSuccess, + validator.StatusSkipped, + ), "Stub validator should succeed or skip")
240-244: PreferSkip()over silent return for conditional test execution.Silently returning on error can mask real issues (authentication failures, network problems). Using
Skip()makes test results clearer and distinguishes between "not applicable" and "passed".♻️ Proposed fix
if err != nil { logger.Warn("Failed to get Compute API status", "error", err.Error()) - // Don't fail test - API might not be enabled - return + Skip(fmt.Sprintf("Skipping Compute API test: %v", err)) }validator/pkg/validators/api_enabled.go (1)
163-168: EmptyrequiredAPIshandling returns success instead of skipped.When no required APIs are configured, the validator returns
StatusSuccesswith message "No required APIs to validate". A previous review suggested returningStatusSkippedto better indicate that no actual validation occurred. Consider whether success is the appropriate status here.♻️ Alternative using StatusSkipped
// Build success message based on whether APIs were checked - message := fmt.Sprintf("All %d required APIs are enabled", len(enabledAPIs)) - if len(enabledAPIs) == 0 { - message = "No required APIs to validate" + if len(enabledAPIs) == 0 { + return &validator.Result{ + Status: validator.StatusSkipped, + Reason: "NoAPIsConfigured", + Message: "No required APIs configured for validation", + Details: map[string]interface{}{ + "project_id": vctx.Config.ProjectID, + }, + } } + message := fmt.Sprintf("All %d required APIs are enabled", len(enabledAPIs)) slog.Info(message)
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@validator/cmd/validator/main.go`:
- Around line 3-11: The code writes results to ResultsPath via os.WriteFile but
doesn’t ensure the target directory exists; before any os.WriteFile call (where
ResultsPath or outputFile is used) call os.MkdirAll(filepath.Dir(outputFile),
0o755) to create the parent directory (importing "path/filepath"), and
handle/return any error from MkdirAll so writes don’t fail on custom nested
paths.
In `@validator/pkg/validator/context_test.go`:
- Around line 185-216: The test and implementation allow unprotected lazy
initialization races: update the Context implementation to protect each
lazily-initialized service (e.g., computeService, iamService,
serviceUsageService, monitoringService) using sync.Once or a dedicated mutex per
service to avoid the check-then-set data race in GetComputeService,
GetIAMService, GetServiceUsageService and GetMonitoringService; then extend the
test to spawn many goroutines calling the same getter concurrently (for example,
50 goroutines all calling GetComputeService) and assert there are no
panics/races to ensure the synchronization is effective.
In `@validator/pkg/validator/validator.go`:
- Around line 60-107: In Aggregate, handle StatusSkipped so skipped validators
don't cause a failure: in the loop over results, add a case for StatusSkipped
that increments a skippedCount (do not append to failedChecks or
failureDescriptions), and include skippedCount in details (e.g.,
details["skipped_checks"]). Compute effectiveChecks := checksRun - skippedCount
(or use checksRun and skippedCount in the final message) and determine success
by checking len(failedChecks) == 0 (no failures) rather than checksPassed ==
checksRun; update the success branch and the failure message to use
effectiveChecks/skip-aware counts so skipped validators are excluded from
failure logic and reporting.
♻️ Duplicate comments (3)
validator/pkg/validator/resolver.go (1)
150-187: Validators with only missing dependencies disappear from diagram.This issue was flagged in a previous review: when a validator has
RunAfterentries but none exist in the resolver's validator set,hasDependencies[name]is set totruebut no edges are emitted, causing the validator to be excluded from standalone nodes.🐛 Previously suggested fix
- if len(meta.RunAfter) > 0 { - hasDependencies[name] = true - for _, dep := range meta.RunAfter { - // Only show edge if dependency exists in our validator set - if _, exists := r.validators[dep]; exists { - result += fmt.Sprintf(" %s --> %s\n", name, dep) - } - } - } + if len(meta.RunAfter) > 0 { + for _, dep := range meta.RunAfter { + // Only show edge if dependency exists in our validator set + if _, exists := r.validators[dep]; exists { + hasDependencies[name] = true + result += fmt.Sprintf(" %s --> %s\n", name, dep) + } + } + }validator/pkg/validator/context.go (1)
54-117: Lazy client initialization is not thread-safe.Parallel validators can call getters concurrently, racing on nil-check/assign and creating duplicate clients or data races.
🛠️ Suggested fix (sync.Once per service)
import ( "context" "fmt" "log/slog" + "sync" @@ type Context struct { @@ // Client factory for creating GCP service clients clientFactory *gcp.ClientFactory + // Sync primitives for thread-safe lazy init + computeOnce sync.Once + iamOnce sync.Once + cloudResourceMgrOnce sync.Once + serviceUsageOnce sync.Once + monitoringOnce sync.Once + + computeErr error + iamErr error + cloudResourceMgrErr error + serviceUsageErr error + monitoringErr error @@ func (c *Context) GetComputeService(ctx context.Context) (*compute.Service, error) { - if c.computeService == nil { - svc, err := c.clientFactory.CreateComputeService(ctx) - if err != nil { - return nil, fmt.Errorf("failed to create compute service: %w", err) - } - c.computeService = svc - } - return c.computeService, nil + c.computeOnce.Do(func() { + c.computeService, c.computeErr = c.clientFactory.CreateComputeService(ctx) + if c.computeErr != nil { + c.computeErr = fmt.Errorf("failed to create compute service: %w", c.computeErr) + } + }) + return c.computeService, c.computeErr }validator/test/integration/validator_integration_test.go (1)
271-285: Timeout test doesn’t validate config-driven timeout.The timeout is enforced by
context.WithTimeout, socfg.MaxWaitTimeSecondsisn’t actually exercised. Consider renaming the test or wiring the executor to derive its timeout from config.🛠️ Suggested rename (if keeping context-driven timeout)
- It("should respect global timeout from configuration", func() { + It("should handle context cancellation gracefully", func() {
🧹 Nitpick comments (5)
validator/pkg/gcp/client_test.go (2)
69-88: Unused table test parameter.The
shouldRetryparameter (line 71) is declared but never used in the test body. Consider removing it or using it in an assertion for clearer intent.♻️ Suggested fix
DescribeTable("should retry based on error code", - func(errorCode int, shouldRetry bool, expectedAttempts int) { + func(errorCode int, expectedAttempts int) { callCount := 0 operation := func() error { callCount++ return &googleapi.Error{Code: errorCode} } err := gcp.RetryWithBackoffForTesting(ctx, operation) Expect(err).To(HaveOccurred(), "Should return error") Expect(callCount).To(Equal(expectedAttempts)) }, - Entry("429 Rate Limit - should retry", 429, true, 5), - Entry("503 Service Unavailable - should retry", 503, true, 5), - Entry("500 Internal Error - should retry", 500, true, 5), - Entry("404 Not Found - should not retry", 404, false, 1), - Entry("403 Forbidden - should not retry", 403, false, 1), + Entry("429 Rate Limit - should retry", 429, 5), + Entry("503 Service Unavailable - should retry", 503, 5), + Entry("500 Internal Error - should retry", 500, 5), + Entry("404 Not Found - should not retry", 404, 1), + Entry("403 Forbidden - should not retry", 403, 1), )
110-123: Potential flakiness with tight timeout.The 50ms timeout is quite short and could cause flaky tests in resource-constrained CI environments. Consider increasing to 100-200ms for more reliable execution while still testing the timeout behavior effectively.
validator/pkg/config/config.go (1)
99-119: Consider logging invalid environment variable values.When
getEnvBoolorgetEnvIntencounter invalid values, they silently fall back to defaults. While this behavior is documented, logging a warning could help operators debug misconfiguration issues.validator/pkg/validator/registry.go (1)
55-66: Inconsistent locking pattern between instance and package-level Register.The package-level
Registerdirectly accessesglobalRegistry.muandglobalRegistry.validatorsinstead of callingglobalRegistry.Register(). While functionally correct, this duplicates the locking logic and creates a semantic difference (panic on duplicate vs. overwrite).Consider documenting this intentional behavioral difference more explicitly, or unifying by having the instance method also check for duplicates with a flag/option if needed for production use.
♻️ Alternative: delegate to instance method with duplicate check
func Register(v Validator) { meta := v.Metadata() globalRegistry.mu.Lock() defer globalRegistry.mu.Unlock() if _, exists := globalRegistry.validators[meta.Name]; exists { panic(fmt.Sprintf("validator already registered: %s", meta.Name)) } globalRegistry.validators[meta.Name] = v }The current implementation is fine; just ensure the behavioral difference (panic vs. overwrite) is intentional and documented.
validator/pkg/validator/resolver.go (1)
68-104: Potential stack overflow on deeply nested dependencies.The recursive
calcLevelfunction could cause a stack overflow with very deep dependency chains. While unlikely in practice for validators, consider adding a depth limit or converting to iterative approach for robustness.
|
/lgtm |
|
Merge this PR with comments opened. This repo will be archived soon. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@validator/test/integration/validator_integration_test.go`:
- Around line 240-250: The test currently does an early "return" inside the
failure branch (the logger.Warn call) which causes the test to silently pass;
change that to call Ginkgo's Skip() (e.g., replace the logger.Warn + return with
logger.Warn(...) followed by Skip(fmt.Sprintf("Compute API not enabled or error:
%v", err))) so the test is marked as skipped and the error is included in the
message; ensure Skip is available from Ginkgo and keep the subsequent
Expect(service).NotTo(BeNil()) / logger.Info("Compute API status", ...) path
unchanged for the success case.
♻️ Duplicate comments (1)
validator/pkg/validator/context.go (1)
68-80: Error is not persisted across calls—subsequent callers silently receive(nil, nil)on initialization failure.The
errvariable is local to each function call. Whensync.Once.Do()runs on the first call and fails,erris set. On subsequent calls, a newerrvariable is created (initialized tonil), butDo()doesn't re-execute the closure. This means:
- First call fails → returns
(nil, error)✓- Subsequent calls → returns
(nil, nil)✗This same pattern affects all five service getters.
🔒️ Proposed fix: store error in struct
Add error fields to the struct:
type Context struct { // ...existing fields... + // Error fields for lazy initialization (persisted for subsequent calls) + computeErr error + iamErr error + cloudResourceMgrErr error + serviceUsageErr error + monitoringErr error + // GCP Clients (lazily initialized, shared across validators) computeService *compute.Service // ... }Update getters to use struct error field:
func (c *Context) GetComputeService(ctx context.Context) (*compute.Service, error) { - var err error c.computeOnce.Do(func() { - c.computeService, err = c.clientFactory.CreateComputeService(ctx) - if err != nil { - err = fmt.Errorf("failed to create compute service: %w", err) + c.computeService, c.computeErr = c.clientFactory.CreateComputeService(ctx) + if c.computeErr != nil { + c.computeErr = fmt.Errorf("failed to create compute service: %w", c.computeErr) } }) - if err != nil { - return nil, err - } - return c.computeService, nil + return c.computeService, c.computeErr }Apply the same pattern to all other getters.
🧹 Nitpick comments (2)
validator/pkg/validator/context_test.go (1)
190-213: Thread-safety test validates concurrency but not error persistence.The test verifies that concurrent access doesn't cause panics or races, which is good. However, given the error persistence bug in
context.go, consider adding a test that verifies: when initialization fails, all subsequent calls (from different goroutines) also return the same error rather than(nil, nil).💡 Suggested additional test case
It("should return consistent error across concurrent calls when initialization fails", func() { ctx := context.Background() var wg sync.WaitGroup const numGoroutines = 10 errors := make([]error, numGoroutines) services := make([]*serviceusage.Service, numGoroutines) for i := 0; i < numGoroutines; i++ { wg.Add(1) go func(idx int) { defer GinkgoRecover() defer wg.Done() services[idx], errors[idx] = vctx.GetServiceUsageService(ctx) }(i) } wg.Wait() // All calls should return consistent results // Either all succeed with non-nil service, or all fail with non-nil error firstErr := errors[0] firstSvc := services[0] for i := 1; i < numGoroutines; i++ { if firstErr != nil { Expect(errors[i]).To(HaveOccurred(), "All calls should return error if first failed") } else { Expect(services[i]).To(Equal(firstSvc), "All calls should return same service instance") } } })validator/pkg/validator/resolver.go (1)
68-104:assignLevelsassumes no cycles exist—document this precondition.The recursive
calcLevelfunction will infinite-loop if a cycle exists because there's no visited tracking to prevent re-entry during the same DFS path. This is safe sinceResolveExecutionGroupsalways callsdetectCyclesfirst, but the implicit dependency should be documented.📝 Suggested documentation improvement
// assignLevels performs topological sort and assigns execution levels +// Precondition: detectCycles() must be called first to ensure no circular dependencies exist. +// Without this precondition, this function may infinite-loop on cyclic graphs. func (r *DependencyResolver) assignLevels() map[string]int {
bf48be8 to
92ca512
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@validator/test/integration/validator_integration_test.go`:
- Around line 169-180: The test currently treats any non-StatusSuccess result as
a failure which conflates StatusSkipped with StatusFailure; update the loop over
results in validator_integration_test.go to increment successCount when r.Status
== validator.StatusSuccess, failureCount only when r.Status ==
validator.StatusFailure, and optionally track skippedCount when r.Status ==
validator.StatusSkipped, then change the assertions to compare checksPassed to
successCount and checksRun - checksPassed - skippedCount (or explicitly
checksRun - checksPassed if you track skippedCount==0) to failureCount so
skipped results are not counted as failures; reference the results slice,
StatusSuccess, StatusFailure, StatusSkipped, and the Aggregate behavior in
validator.Aggregate to locate the relevant code.
♻️ Duplicate comments (2)
validator/pkg/validators/api_enabled.go (1)
94-97: EmptyrequiredAPIslist still returns success rather than skip.The previous review flagged that an empty
RequiredAPIslist results in a misleading success with "No required APIs to validate". While you marked it as fixed, the early-return check doesn't appear to be present. The current flow proceeds through the loop (which does nothing), then returns success at lines 158-173.Consider adding an early return with
StatusSkippedor a warning log when no APIs are configured, as suggested previously.🐛 Proposed fix
// Check each required API requiredAPIs := vctx.Config.RequiredAPIs + if len(requiredAPIs) == 0 { + slog.Warn("No required APIs configured, skipping validation") + return &validator.Result{ + Status: validator.StatusSuccess, // or StatusSkipped if available + Reason: "NoAPIsConfigured", + Message: "No required APIs configured for validation", + Details: map[string]interface{}{ + "project_id": vctx.Config.ProjectID, + }, + } + } enabledAPIs := []string{} disabledAPIs := []string{},
validator/pkg/validator/validator.go (1)
57-88:StatusSkippedresults incorrectly contribute to failure determination.The aggregation logic increments
checksPassedonly forStatusSuccess, but the success conditionchecksPassed == checksRunfails when any validator is skipped, even if there are no actual failures. This causes skipped validators to be incorrectly counted as failures.🛠️ Suggested fix
func Aggregate(results []*Result) *AggregatedResult { checksRun := len(results) checksPassed := 0 + checksSkipped := 0 var failedChecks []string var failureDescriptions []string // Single pass to collect all failure information for _, r := range results { switch r.Status { case StatusSuccess: checksPassed++ case StatusFailure: failedChecks = append(failedChecks, r.ValidatorName) failureDescriptions = append(failureDescriptions, fmt.Sprintf("%s (%s)", r.ValidatorName, r.Reason)) + case StatusSkipped: + checksSkipped++ } } details := map[string]interface{}{ "checks_run": checksRun, "checks_passed": checksPassed, + "checks_skipped": checksSkipped, "timestamp": time.Now().UTC().Format(time.RFC3339), "validators": results, } - if checksPassed == checksRun { + if len(failedChecks) == 0 { return &AggregatedResult{ Status: StatusSuccess, Reason: "ValidationPassed", Message: "All GCP validation checks passed successfully", Details: details, } }
🧹 Nitpick comments (3)
validator/pkg/validators/api_enabled.go (1)
99-141: Consider deferringreqCancelfor safety.Calling
reqCancel()aftersvc.Services.Get(...).Do()works, but if a panic occurs between context creation and the cancel call, the context would leak. Usingdeferis idiomatic and safer.♻️ Suggested improvement
for _, apiName := range requiredAPIs { // Add per-request timeout reqCtx, reqCancel := context.WithTimeout(ctx, apiRequestTimeout) + defer reqCancel() // Ensure cleanup even on panic serviceName := fmt.Sprintf("projects/%s/services/%s", vctx.Config.ProjectID, apiName) slog.Debug("Checking API", "api", apiName) service, err := svc.Services.Get(serviceName).Context(reqCtx).Do() - reqCancel() // Clean up contextvalidator/pkg/validator/executor.go (1)
70-91: Consider whether context cancellation should stop execution between groups.The executor passes
ctxto validators and waits for group completion, but doesn't checkctx.Err()between groups. If the context is cancelled mid-execution, the current group will complete (validators checkctxinternally), but subsequent groups will still start.♻️ Optional improvement
// 4. Execute validators group by group allResults := []*Result{} for _, group := range groups { + // Check if context was cancelled before starting next group + if ctx.Err() != nil { + e.logger.Warn("Context cancelled, stopping execution", "error", ctx.Err()) + return allResults, ctx.Err() + } + e.logger.Info("Executing level", "level", group.Level, "validators", len(group.Validators))validator/test/integration/validator_integration_test.go (1)
115-140: Consider making the quota-check test more flexible.The test asserts
StatusSuccess(line 138), which assumes the stub always succeeds. If the stub implementation changes or returns a different status in certain environments, this test will fail unexpectedly.♻️ Suggested improvement
- // Currently a stub, so should succeed - Expect(result.Status).To(Equal(validator.StatusSuccess)) + // Verify result has a valid status (stub currently succeeds) + Expect(result.Status).To(BeElementOf( + validator.StatusSuccess, + validator.StatusSkipped, + ), "Stub should succeed or skip")
1a65788 to
54fc351
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xueli181114 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
Testing result
Next Steps
HYPERFLEET-510, HYPERFLEET-511, HYPERFLEET-512
Summary by CodeRabbit
New Features
Configuration
Operations
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.