HYPERFLEET-575 | test: Add hyperfleet deployed steps for hyperfleet e2e ci test#22
HYPERFLEET-575 | test: Add hyperfleet deployed steps for hyperfleet e2e ci test#22yingzhanredhat wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughThis PR introduces a comprehensive deployment framework for HyperFleet CLM components. It adds a main orchestration script (deploy-clm.sh) that manages installation and uninstallation of API, Sentinel, and Adapter components via Helm charts for E2E environments. Supporting library modules provide shared utilities for logging, dependency validation, Helm chart cloning, and component-specific deployment workflows. Configuration templates, documentation, and example adapter test data are included. Sequence Diagram(s)sequenceDiagram
participant User
participant deploy-clm.sh as Main Script
participant Common as lib/common.sh
participant Helm as lib/helm.sh
participant API as lib/api.sh
participant Sentinel as lib/sentinel.sh
participant Adapter as lib/adapter.sh
participant K8s as Kubernetes/Helm
User->>deploy-clm.sh: execute --action install --namespace X
deploy-clm.sh->>Common: check_dependencies()
Common->>Common: verify kubectl, helm, git
Common-->>deploy-clm.sh: ✓ ready
deploy-clm.sh->>Common: validate_kubectl_context()
Common->>K8s: check cluster connectivity
Common-->>deploy-clm.sh: ✓ connected
deploy-clm.sh->>Helm: clone_helm_chart(api)
Helm->>K8s: git clone (sparse)
Helm-->>deploy-clm.sh: chart path
deploy-clm.sh->>API: install_api()
API->>Adapter: get_adapters_by_type()
Adapter-->>API: adapter list
API->>K8s: helm upgrade --install
API->>K8s: kubectl get pods (health check)
API-->>deploy-clm.sh: ✓ API ready
deploy-clm.sh->>Helm: clone_helm_chart(sentinel)
Helm->>K8s: git clone (sparse)
Helm-->>deploy-clm.sh: chart path
deploy-clm.sh->>Sentinel: install_sentinel()
Sentinel->>Sentinel: install_sentinel_instance(clusters)
Sentinel->>K8s: helm upgrade --install
Sentinel->>K8s: verify_pod_health()
Sentinel->>Sentinel: install_sentinel_instance(nodepools)
Sentinel->>K8s: helm upgrade --install
Sentinel-->>deploy-clm.sh: ✓ Sentinel ready
deploy-clm.sh->>Helm: clone_helm_chart(adapter)
Helm->>K8s: git clone (sparse)
Helm-->>deploy-clm.sh: chart path
deploy-clm.sh->>Adapter: install_adapters()
Adapter->>Adapter: discover_adapters()
Adapter->>Adapter: install_adapter_instance() (per adapter)
Adapter->>K8s: helm upgrade --install
Adapter->>K8s: verify_pod_health()
Adapter-->>deploy-clm.sh: ✓ Adapters ready
deploy-clm.sh->>K8s: helm list (status summary)
deploy-clm.sh->>User: ✓ Deployment complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Fix all issues with AI agents
In `@deploy-scripts/.env.example`:
- Around line 32-66: Reorder the env keys so each section's variables are
alphabetically sorted and remove the stray empty line between API_BASE_URL and
the "Helm Chart Sources" section; specifically, alphabetize keys in the
"Sentinel Component Configuration" (e.g., SENTINEL_BROKER_TYPE,
SENTINEL_GOOGLEPUBSUB_CREATE_TOPIC_IF_MISSING, SENTINEL_IMAGE_REPO,
SENTINEL_IMAGE_TAG), "Adapter Component Configuration" (e.g.,
ADAPTER_GOOGLEPUBSUB_CREATE_SUBSCRIPTION_IF_MISSING,
ADAPTER_GOOGLEPUBSUB_CREATE_TOPIC_IF_MISSING, ADAPTER_IMAGE_REPO,
ADAPTER_IMAGE_TAG), and "Helm Chart Sources" (API_CHART_PATH, API_CHART_REF,
API_CHART_REPO, etc.), and remove the extra blank line after API_BASE_URL so
dotenv-linter no longer reports unordered keys or an extra blank line.
In `@deploy-scripts/deploy-clm.sh`:
- Around line 372-379: The code uses a hardcoded service name "hyperfleet-api"
when fetching the external IP (in the block guarded by INSTALL_API and the
external_ip variable), which breaks if RELEASE_PREFIX is customized; change the
kubectl service lookup to use the Helm-derived name by constructing the service
as "${RELEASE_PREFIX}-api" (or fallback to "hyperfleet-api" only if
RELEASE_PREFIX is empty), i.e., replace the literal "hyperfleet-api" in the
kubectl get svc call with the computed service name using the RELEASE_PREFIX
variable so the lookup matches the service created by api.sh.
In `@deploy-scripts/lib/api.sh`:
- Around line 95-97: The helm release existence check using `helm list -n
"${NAMESPACE}" | grep -q "^${release_name}"` can produce false positives
(matches prefixes); update the check in the function that uses `release_name`
and `NAMESPACE` to use Helm’s exact matching (e.g., `helm list -q -f
"^${release_name}$" -n "${NAMESPACE}"`) so only the exact release name matches,
and apply the same change to other uninstall/check functions that use the same
grep pattern to avoid prefix collisions.
In `@deploy-scripts/lib/helm.sh`:
- Around line 11-25: The clone_helm_chart function currently does rm -rf on
component_dir which may be unsafe if WORK_DIR or component are empty; add
defensive validation before deleting: ensure WORK_DIR and component are
non-empty and component_dir resolves under WORK_DIR (e.g., check that
component_dir starts with WORK_DIR and that component does not contain path
traversal or absolute paths), and only then perform rm -rf; reference
clone_helm_chart, component_dir, WORK_DIR, component and the rm -rf call when
adding these guards and fail fast with a log/error if validation fails.
In `@deploy-scripts/lib/sentinel.sh`:
- Around line 12-21: Add explicit validation for the resource_type parameter in
install_sentinel_instance (and the corresponding uninstall_sentinel_instance) so
values other than "clusters" or "nodepools" are rejected instead of defaulting
to Nodepools; at the start of each function check if resource_type is exactly
"clusters" or "nodepools", log an error via echo or logger and return a non-zero
status (or exit) on invalid input, then proceed to set resource_type_display
("Clusters" or "Nodepools") only after the validated branch.
In `@deploy-scripts/README.md`:
- Around line 94-146: The documentation is inconsistent: `--api-image-repo` is
documented as "without registry" but the example passes a full registry path;
choose the "without registry" convention and update the example
accordingly—modify the "Install with Custom Image Repository" example to pass
`--image-registry` (e.g., `myregistry.io`) and set `--api-image-repo` to just
the repo name (e.g., `hyperfleet-api` or `myregistry-repo` without the
registry), and ensure the README’s note about final image path
(`${IMAGE_REGISTRY}/${IMAGE_REPO}:${IMAGE_TAG}`) remains and matches the
example; adjust any text or flags in the example block that currently pass a
full registry in `--api-image-repo` (and similarly for other *-image-repo flags
if present).
In `@testdata/adapter-configs/clusters-example1-namespace/deployment.yaml`:
- Around line 13-29: The Deployment's container "test" is flagged for writable
root filesystem; update the Pod spec/template and container "test" to include an
explicit securityContext (pod-level and container-level) to meet Pod Security
Standards: switch the image to an unprivileged build (e.g.,
nginxinc/nginx-unprivileged:1.25-alpine) or ensure non-root runtime, set
securityContext.runAsNonRoot: true, securityContext.runAsUser to a non-zero UID,
set container.securityContext.readOnlyRootFilesystem: true and
securityContext.allowPrivilegeEscalation: false, and adjust the containerPort
from 80 to 8080 if you adopt the unprivileged image (or verify consumers still
expect 80).
In `@testdata/adapter-configs/clusters-example1-namespace/job-rolebinding.yaml`:
- Around line 4-19: The RoleBinding resource "status-reporter" is templating
namespaces with "{{ .clusterId }}" which mismatches the Deployment's "{{
.namespace }}"; update the namespace fields under the RoleBinding
metadata.namespace, the subjects[0].namespace, and any related RBAC resources
(Role and ServiceAccount manifests such as job-role.yaml and
job-serviceaccount.yaml) to use "{{ .namespace }}" instead of "{{ .clusterId
}}", ensuring Role, RoleBinding.roleRef linkage, and the ServiceAccount
name/namespace all reference the same "{{ .namespace }}" template value.
In `@testdata/adapter-configs/clusters-example1-namespace/job.yaml`:
- Around line 30-205: Add pod-level securityContext and per-container
securityContext to enforce non-root execution and a read-only root filesystem:
set pod.spec.securityContext with runAsUser: 1000, runAsNonRoot: true, fsGroup:
1000, and a seccompProfile (RuntimeDefault); then on each container (example-job
and status-reporter) add securityContext with readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, runAsNonRoot: true, runAsUser: 1000, and
capabilities: drop: ["ALL"]; keep the existing volumeMount for "results"
writable (do not set it readOnly) so the non-root user (fsGroup 1000) can write
to /results. Use the container names "example-job" and "status-reporter" and the
volume name "results" to locate where to add these settings.
In `@testdata/adapter-configs/clusters-example1-namespace/values.yaml`:
- Around line 14-22: The Pub/Sub config uses empty strings for subscriptionId,
topic, and deadLetterTopic which will cause runtime API validation errors;
update the YAML values for the fields subscriptionId, topic, and deadLetterTopic
(alongside the existing projectId) to explicit placeholders like CHANGE_ME (or
meaningful defaults) so they are not empty strings and clearly indicate they
must be configured before deployment.
🧹 Nitpick comments (6)
testdata/adapter-configs/clusters-example1-namespace/values.yaml (1)
24-28: Avoidlatest+Alwaysfor reproducible CI.Using
latestwithAlwaysmakes runs non-deterministic and can introduce flaky deployments. Prefer a pinned tag/digest and a less aggressive pull policy, then override explicitly in CI when needed.♻️ Suggested fix
image: registry: CHANGE_ME repository: hyperfleet-adapter - pullPolicy: Always - tag: latest + pullPolicy: IfNotPresent + tag: "CHANGE_ME" # pin or override in CItestdata/adapter-configs/clusters-example1-namespace/job.yaml (1)
166-168: Pin status-reporter image instead oflatest.
latestcan change unexpectedly and break CI. Prefer a pinned tag or digest for repeatable runs.♻️ Suggested fix
- image: registry.ci.openshift.org/ci/status-reporter:latest - imagePullPolicy: Always + image: registry.ci.openshift.org/ci/status-reporter:<pinned-tag-or-digest> + imagePullPolicy: IfNotPresentdeploy-scripts/deploy-clm.sh (1)
403-414: Inconsistent error handling across component uninstallation.
uninstall_apihas explicit error handling (|| log_warning), whileuninstall_adaptersanduninstall_sentineldo not. Although these functions log warnings internally, consider applying consistent call-site error handling for maintainability.Suggested consistent error handling
# Uninstall components in reverse order: Adapter -> Sentinel -> API if [[ "${INSTALL_ADAPTER}" == "true" ]]; then - uninstall_adapters + uninstall_adapters || log_warning "Some adapters failed to uninstall" fi if [[ "${INSTALL_SENTINEL}" == "true" ]]; then - uninstall_sentinel + uninstall_sentinel || log_warning "Sentinel uninstallation had issues" fi if [[ "${INSTALL_API}" == "true" ]]; then uninstall_api || log_warning "Failed to uninstall API" fideploy-scripts/lib/adapter.sh (3)
22-27: Declare and assign separately to avoid masking return values.Combining
localdeclaration with command substitution masks the return value ofbasename. Ifbasenamefails, the error code is lost.Proposed fix
while IFS= read -r -d '' dir; do - local basename=$(basename "$dir") + local dir_basename + dir_basename=$(basename "$dir") - if [[ "$basename" =~ ^(clusters|nodepools)- ]]; then - adapter_dirs+=("$basename") + if [[ "$dir_basename" =~ ^(clusters|nodepools)- ]]; then + adapter_dirs+=("$dir_basename") fi done < <(find "${adapter_configs_dir}" -mindepth 1 -maxdepth 1 -type d -print0)Note: Also renamed to
dir_basenameto avoid shadowing thebasenamecommand.
53-59: Same issue: declare and assign separately.Apply the same fix pattern here as in
discover_adapters().Proposed fix
while IFS= read -r -d '' dir; do - local basename=$(basename "$dir") - if [[ "$basename" =~ ^${resource_type}-(.+)$ ]]; then + local dir_basename + dir_basename=$(basename "$dir") + if [[ "$dir_basename" =~ ^${resource_type}-(.+)$ ]]; then # Extract just the adapter name (everything after "clusters-" or "nodepools-") local adapter_name="${BASH_REMATCH[1]}" adapter_names+=("${adapter_name}")
236-240: Consider using a more precise grep pattern for release name matching.The pattern
^${release_name}could match releases that share the same prefix (e.g.,hyperfleet-adapter-clusters-example1would matchhyperfleet-adapter-clusters-example1-v2). Using a tab delimiter or word boundary would be more precise.Proposed fix using tab delimiter
# Check if release exists - if ! helm list -n "${NAMESPACE}" 2>/dev/null | grep -q "^${release_name}"; then + if ! helm list -n "${NAMESPACE}" 2>/dev/null | grep -q "^${release_name}[[:space:]]"; then log_warning "Release '${release_name}' not found in namespace '${NAMESPACE}'" return 0 fi
testdata/adapter-configs/clusters-example1-namespace/adapter-task-resource-deployment.yaml
Show resolved
Hide resolved
testdata/adapter-configs/clusters-example1-namespace/adapter-task-resource-job-rolebinding.yaml
Show resolved
Hide resolved
testdata/adapter-configs/clusters-example1-namespace/adapter-task-resource-job.yaml
Show resolved
Hide resolved
testdata/adapter-configs/clusters-example1-namespace/values.yaml
Outdated
Show resolved
Hide resolved
183da07 to
9c16aa7
Compare
9c16aa7 to
554702a
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@deploy-scripts/deploy-clm.sh`:
- Around line 70-72: The default API_BASE_URL value is inconsistent with the
help text; change the default assignment for API_BASE_URL to use the cluster DNS
form including the namespace variable (e.g.
http://hyperfleet-api.${NAMESPACE}.svc.cluster.local:8000) and update the other
occurrence(s) later in the script (the second API_BASE_URL default around the
block at lines 138-141) so both defaults match the help text; ensure you
reference the NAMESPACE environment/variable used elsewhere in the script when
forming the URL.
In `@deploy-scripts/lib/api.sh`:
- Around line 18-27: get_adapters_by_type may exit non‑zero under set -e and
abort before the fallback env vars are used; wrap the calls so failures are
absorbed and variables default to empty. Call get_adapters_by_type for
"clusters" and "nodepools" in a way that does not trigger errexit (e.g. capture
output but treat non‑zero as success or explicitly handle return status), and
ensure discovered_cluster_adapters and discovered_nodepool_adapters are set to
"" on failure so the later fallbacks to API_ADAPTERS_CLUSTER and
API_ADAPTERS_NODEPOOL work as intended. Use the existing symbols
get_adapters_by_type, discovered_cluster_adapters, discovered_nodepool_adapters,
API_ADAPTERS_CLUSTER and API_ADAPTERS_NODEPOOL to locate and apply the change.
In
`@testdata/adapter-configs/clusters-example1-namespace/adapter-task-config.yaml`:
- Around line 32-41: The serviceAccountName parameter is defined as required but
never used in the deployment templates (the service account is derived from
clusterId), so either remove the serviceAccountName entry or make it optional;
specifically, update the adapter-task-config.yaml to delete the "- name:
\"serviceAccountName\" ..." block if you don't need a separate service account,
or change its required flag to false (and adjust any docs/env variable
references), or alternatively modify the deployment template to reference
serviceAccountName instead of constructing the service account from clusterId if
the parameter was intended to be used.
Summary by CodeRabbit
New Features
Documentation
Chores