From 0c5cf7b2b793f5ec4dec9be01ff86a8f697275df Mon Sep 17 00:00:00 2001 From: yasun Date: Fri, 30 Jan 2026 15:31:21 +0800 Subject: [PATCH 1/2] HYPERFLEET-532 | docs: E2E test run strategy and Resource Management spike --- .../e2e-run-strategy-spike-report.md | 767 ++++++++++++++++++ 1 file changed, 767 insertions(+) create mode 100644 hyperfleet/e2e-testing/e2e-run-strategy-spike-report.md diff --git a/hyperfleet/e2e-testing/e2e-run-strategy-spike-report.md b/hyperfleet/e2e-testing/e2e-run-strategy-spike-report.md new file mode 100644 index 0000000..5efa0f8 --- /dev/null +++ b/hyperfleet/e2e-testing/e2e-run-strategy-spike-report.md @@ -0,0 +1,767 @@ +# Spike Report: HyperFleet E2E Test Automation Run Strategy + +**JIRA Story:** HYPERFLEET-532 +**Status:** Draft +**Focus:** Deployment lifecycle management, resource isolation, and parallel Test Run execution safety + +--- + +## 1. Problem Statement + +HyperFleet E2E testing validates system-level behavior across multiple cooperating components, including: + +- HyperFleet API +- Sentinel +- Adapter framework (multiple adapter types) +- Messaging broker (Topics / Subscriptions) + +As E2E coverage expands and test pipelines begin executing in parallel, the current approach lacks a clearly defined **E2E test run strategy** to govern: + +- Deployment lifecycle ownership +- Resource isolation boundaries +- Race condition prevention in concurrent executions +- Reliable cleanup and observability + +This results in: + +- Flaky test failures caused by shared resources +- Unclear ownership of deployed components +- Orphaned Kubernetes and broker resources +- Limited scalability of parallel pipelines + +This spike defines a **comprehensive E2E test automation run strategy**, focusing on how tests are **deployed, isolated, coordinated, and cleaned up**, rather than on individual test case logic. + +--- + +## 2. Goals and Non-Goals + +### 2.1 Goals + +This spike aims to define a strategy that: + +- Enables **safe parallel execution** of multiple Test Runs +- Ensures **strong resource isolation** between test runs +- Clearly defines **deployment lifecycle ownership** +- Prevents race conditions by design +- Supports **dynamic adapter deployment and removal** (hot-plugging) +- Improves **debuggability and maintainability** +- Establishes reusable patterns for future E2E expansion + +--- + +### 2.2 Non-Goals + +This spike explicitly does **not** cover: + +- Individual test case implementation +- CI/CD pipeline configuration +- Performance or load testing considerations +- External environments not related to HyperFleet (such as cloud resources) + +--- + +## 3. Core Design Principles + +### 3.1 Test Run as the Primary Isolation Unit + +All test infrastructure, configuration, and resources are scoped to a **single Test Run**. + +A Test Run is the smallest unit of: + +- Isolation +- Resource ownership + +--- + +### 3.2 Explicit Lifecycle Ownership + +Every component participating in E2E testing must have clearly defined ownership for: + +- Creation +- Runtime management +- Teardown + +Implicit or shared ownership is considered a design flaw. + +--- + +### 3.3 Isolation Over Optimization + +When trade-offs exist, this strategy prioritizes: + +> Reliability, isolation, and debuggability over startup speed or resource reuse. + +--- + +## 4. E2E Test Run Model + +### 4.1 Test Run Definition + +A **Test Run** represents one or more E2E test cases executed sequentially as a single unit. + +Each Test Run has: + +- A globally unique **Test Run ID** +- A well-defined lifecycle: `setup → execute → teardown` +- Exclusive ownership of all resources it creates + +--- + +### 4.2 Test Run Identification + +Each Test Run generates a unique identifier (Test Run ID) derived from: + +- CI-provided environment variable (when available) +- Unix timestamp with high entropy +- Customized with random components + +The Test Run ID is consistently applied to: + +- Kubernetes Namespaces +- Resource names +- Broker Topics and Subscriptions +- Labels and annotations + +Namespaces are additionally labeled to indicate execution context: + +- Label `ci` distinguishes CI pipeline runs (`yes`) from local developer runs (`no`) +- Enables context-appropriate retention policies +- Does not affect test execution behavior + +This ensures **traceability** and **collision avoidance**. + +--- + +### 4.3 Test Run Lifecycle + +Each Test Run follows a well-defined lifecycle: + +``` +Create Namespace + ↓ +Deploy Infrastructure + ↓ +Infrastructure Ready + ↓ +Execute Test Suites + ↓ +Cleanup +``` + +**Infrastructure Deployment** includes: +- Database (PostgreSQL, deployed with API) +- API and Sentinel +- Broker connectivity +- Custom Resource Definitions (CRDs) +- Fixture Adapter + +**Infrastructure Ready** means: +- All infrastructure components are healthy +- Fixture Adapter is operational +- Test suites can execute independently +- No production adapters are deployed yet + +**Test Suite Execution**: +- Suites execute sequentially +- Each suite may deploy/remove production adapters as needed +- Environment state persists across suites within the same Test Run + +--- + +### 4.4 Fixture Adapter + +The **Fixture Adapter** is a minimal test infrastructure component deployed as part of the Test Run. + +**Purpose**: +- Enables core workflows (cluster/nodepool lifecycle) to complete independently +- Provides stable baseline for adapter-independent testing +- Acts as minimal event consumer for workflow completion + +**Characteristics**: +- Deployed during Test Run setup +- Lifecycle owned by Test Framework +- Not used for business logic validation +- Remains active throughout the Test Run + +--- + +## 5. Deployment Lifecycle Strategy + +### 5.1 One Namespace per Test Run + +Each Test Run is assigned a **dedicated Kubernetes Namespace**. + +This Namespace serves as the hard isolation boundary for: + +- API +- Sentinel +- Adapters +- Supporting services (databases, brokers, etc.) + +**Rationale:** + +- Eliminates cross-test interference +- Simplifies cleanup semantics +- Improves debugging clarity +- Avoids complex naming or locking schemes + +--- + +### 5.2 Namespace Naming Convention + +Namespace names follow a consistent pattern for operational clarity: + +``` +e2e-singlens-{TEST_RUN_ID} +``` + +**Components**: +- `e2e-`: Prefix indicating E2E test resources +- `singlens-`: Topology indicator (standard single-namespace deployment) +- `{TEST_RUN_ID}`: Unique test run identifier + +**Rationale**: +- Deployment model immediately visible from namespace name +- Test Run ID enables correlation of resources across test runs +- Operational teams can identify E2E namespaces without inspecting labels + +--- + +### 5.3 Cross-Namespace Topology (Advanced) + +For tests requiring production-realistic deployment validation, an **optional cross-namespace topology** is supported: + +``` +e2e-crossns-{TEST_RUN_ID}-core +e2e-crossns-{TEST_RUN_ID}-adapters +``` + +**Structure**: +- `-core` namespace: API, Sentinel, Broker +- `-adapters` namespace: Adapter components +- Both share same `TEST_RUN_ID` for correlation + +**Use Cases**: +- Validating cross-namespace communication (Service DNS, NetworkPolicy) +- Security boundary testing +- Production deployment model verification + +**Trade-offs**: +- Increased setup complexity +- Requires additional RBAC configuration +- Suitable for integration and security validation tests only + +Most E2E tests should use the standard single-namespace topology. + +--- + +### 5.4 Component Lifecycle Ownership + +| Component | Lifecycle Owner | Scope | Notes | +|--------------------|------------------|--------------|-------| +| Namespace | Test Framework | Per Test Run | | +| API | Test Framework | Per Test Run | | +| Sentinel | Test Framework | Per Test Run | | +| Fixture Adapter | Test Framework | Per Test Run | Infrastructure component | +| Production Adapter | Test Suite | Suite-scoped | Dynamically managed | +| Broker Resources | Adapter/Sentinel | Per Test Run | | + +**Rule:** +No component may create resources outside its Test Run Namespace without Test Run–level isolation. + +--- + +### 5.5 Resource Labeling Strategy + +#### 5.5.1 Required Labels + +All E2E test namespaces must carry exactly three labels: + +1. **`ci`**: Execution context (`yes` | `no`) +2. **`test-run-id`**: Test Run identifier +3. **`managed-by`**: Ownership marker (`e2e-test-framework`) + +**Rationale**: +- `ci`: Enables context-appropriate retention policies +- `test-run-id`: Enables resource correlation and traceability +- `managed-by`: Standard Kubernetes ownership marker + +--- + +## 6. Resource Isolation Strategy + +### 6.1 Kubernetes Resource Isolation + +Isolation is achieved via: + +- Namespace-per-Test-Run +- Consistent `test-run-id` labeling +- Optional but recommended `ResourceQuota` and `LimitRange` + +This prevents: + +- Pod name collisions +- Service discovery conflicts +- Cross-test communication + +--- + +### 6.2 Messaging and Broker Isolation + +Messaging resources (Topics / Subscriptions) are isolated using the Test Run ID. + +Common patterns include: + +- Run-scoped Topics +- Run-scoped Subscriptions +- Adapter-owned Subscription lifecycles + +This avoids: + +- Cross-test event delivery +- Subscription reuse race conditions +- Message leakage between runs + +--- + +## 7. Race Condition Prevention + +Race conditions are prevented through **architectural isolation**, not runtime locking. + +### 7.1 Unique Resource Identification + +All externally visible resources include the Test Run ID in: + +- Names +- Labels +- Broker identifiers + +This guarantees uniqueness even under maximum concurrency. + +--- + +### 7.2 No Shared Mutable State + +The strategy explicitly avoids: + +- Shared Namespaces +- Shared Topics or Subscriptions +- Shared databases +- Shared API instances + +Shared mutable state is the primary source of E2E race conditions. + +--- + +### 7.3 Parallel Test Run Execution Model + +Parallel pipelines are safe because: + +- Each Test Run executes within a sealed resource boundary +- No global locks are required +- Failures are contained within a single Namespace + +--- + +## 8. Test Scenario Organization + +### 8.1 Lifecycle Management Model + +Test infrastructure is managed at the **Test Run level**, not per test case. + +- Infrastructure is deployed once per Test Run +- All test suites share the same environment +- Test cases focus on validation, not deployment + +This ensures: +- Stable environment for workflow validation +- Reduced setup overhead +- Clear separation between infrastructure and behavior testing + +--- + +### 8.2 Test Suite Types + +Test suites represent **validation focus**, not environment configurations. + +#### 8.2.1 Core Suite + +Validates cluster and nodepool workflows using only core components. + +**Environment**: +- Core components (API, Sentinel, Broker) +- Fixture Adapter only (no production adapters) + +**Validates**: +- Cluster lifecycle (create, ready, delete) +- NodePool lifecycle +- Event-driven workflow completion +- Core component behavior under baseline conditions + +**Example flow**: Create cluster → Sentinel publishes event → Fixture Adapter consumes → Reports success → Cluster becomes Ready + +--- + +#### 8.2.2 Adapter Execution Suite + +Validates adapter runtime behavior and job execution. + +**Environment**: +- Core components deployed +- Production adapters hot-plugged **at suite level** (beforeSuite/afterSuite) + +**Validates**: +- Adapter job execution (e.g., Kubernetes namespace creation) +- Event handling correctness +- Error handling and retries +- Resource reconciliation + +**Adapter Management**: +- Adapters deployed once in beforeSuite +- Shared across all test cases in the suite +- Removed in afterSuite +- Tests adapter runtime behavior, not deployment process + +--- + +#### 8.2.3 Adapter Deployment Suite + +Validates adapter installation, configuration, and removal correctness. + +**Environment**: +- Core components deployed +- Production adapters hot-plugged **per test case** + +**Validates**: +- Adapter deployment process +- Configuration correctness +- Subscription registration +- Adapter health and readiness +- Adapter removal and cleanup +- Resource cleanup completeness + +**Adapter Management**: +- Each test case deploys and removes its own adapter instance +- Tests the complete deployment/teardown lifecycle +- Enables testing of various adapter configurations + +--- + +### 8.3 Adapter Lifecycle Management + +Production adapters are **dynamically managed** within a Test Run: + +**Hot-plugging**: +- Adapters can be added or removed between suites or test cases +- Multiple adapters can be deployed in parallel +- Each adapter maintains independent subscriptions + +**Ownership**: +- Test Suite owns adapter lifecycle within its scope +- Adapters are treated as test variables, not infrastructure constants + +**Subscription Management**: +- Each adapter creates unique subscriptions +- Subscription IDs ensure isolation between adapter instances + +**Independence**: +- Core Suite operates independently via Fixture Adapter +- Adapter failures do not impact infrastructure stability + +--- + +### 8.4 Suite Execution Order + +**Recommended Order**: + +Within a Test Run, suites typically execute in this order: + +1. **Core Suite** - Validates baseline functionality (must run first) +2. **Adapter Execution Suite** - Validates adapter runtime behavior +3. **Adapter Deployment Suite** - Validates adapter deployment process + +**Rationale**: +- Core Suite must run first to validate infrastructure readiness +- Adapter Execution and Deployment suites have no dependency on each other +- Adapter Execution Suite runs before Deployment Suite to minimize environment pollution: + - Execution Suite manages adapters at suite level (cleaner state isolation) + - Deployment Suite creates/removes adapters per test case (higher churn) + +**Flexibility**: +- Adapter Execution and Deployment suites can run in either order or in parallel (separate Test Runs) +- Any suite can run independently if infrastructure is ready +- Order recommendation optimizes for state cleanliness, not correctness + +--- + +### 8.5 State Management and Suite Independence + +**State Ownership Model**: + +Test Run state is categorized by lifetime and ownership: + +| State Type | Lifetime | Owner | Examples | +|------------|----------|-------|----------| +| Infrastructure State | Test Run | Test Framework | Namespace, API, Sentinel, Fixture Adapter | +| Adapter State | Suite or Test Case | Test Suite | Production adapter pods, subscriptions | +| Test Data | Test Case | Test Case | Clusters, NodePools, test-specific resources | + +**Isolation Principles**: + +1. **Infrastructure persists** - Core components remain active throughout the Test Run +2. **Adapters are ephemeral** - Created and removed by test suites as needed +3. **Test data is scoped** - Each test case manages its own test resources +4. **Unique naming prevents collision** - Resources use unique identifiers to avoid cross-test interference + +**Suite Independence**: + +- Suites can run independently if infrastructure is ready +- Suite failures do not block subsequent suites (collect all failures) +- Each suite validates its prerequisites at startup + +**Cleanup Responsibility**: + +- Test cases and suites clean their own state (adapters, test data) +- Infrastructure cleanup handled by Test Framework (see Section 9 for retention policy) + +--- + +## 9. Resource Management and Cleanup + +### 9.1 Cleanup Ownership Model + +Cleanup is a shared responsibility between two actors: + +1. **E2E Test Flow**: Responsible for setting retention policy and deleting passed tests +2. **Reconciler Job**: Responsible for enforcing TTL and handling edge cases + +No single component owns all cleanup. This separation prevents single points of failure. + +--- + +### 9.2 Retention Policy + +#### 9.2.1 Default Retention (Safe Fallback) + +All namespaces are annotated with a default retention policy at creation: + +- **Default TTL**: 2 hours from creation +- **Purpose**: Safety net if E2E flow is interrupted or fails before updating retention +- Ensures orphaned namespaces are automatically cleaned up + +#### 9.2.2 Test Result-Based Retention + +E2E flow updates namespace retention annotations based on test outcome: + +| Test Result | CI Context | Local Context | Retention | +|-------------|------------|---------------|-----------| +| **Passed** | Any | Any | 10 minutes | +| **Failed** | `ci=yes` | - | 24 hours | +| **Failed** | `ci=no` | `ci=no` | 6 hours | + +**Rationale**: +- Passed tests have minimal debugging value → short retention conserves quota + - 10-minute window prevents race conditions between E2E flow and reconciler deletion +- Failed tests need retention for post-mortem + - CI (24h): Global team across time zones + - Local (6h): Developer actively investigating +- Default 2h retention: Covers interrupted E2E flows + +#### 9.2.3 Retention Override + +Environment-based configuration allows overriding default retention policy. + +**Use Cases**: +- Extended debugging sessions +- Demonstration environments +- Manual investigation + +Override values are stored in namespace annotations for reconciler consumption. + +--- + +### 9.3 Cleanup Reconciliation + +#### 9.3.1 Reconciler Responsibilities + +A scheduled reconciler job enforces TTL-based cleanup: + +- Runs periodically (frequency configurable, typically 30 minutes) +- Scopes to namespaces labeled as E2E test framework managed +- Deletes namespaces based on retention annotation expiry + +**Simplicity Principle**: Reconciler does not distinguish between: +- Normal vs orphaned namespaces +- CI vs local runs +- Single-namespace vs cross-namespace + +All policy decisions are encoded in namespace annotations. Reconciler is stateless. + +#### 9.3.2 Cross-Namespace Correlation + +For cross-namespace deployments, reconciler must correlate related namespaces: +- Identifies topology from namespace naming convention +- Finds all namespaces sharing the same Test Run ID +- Deletes correlated namespaces together + +**Atomicity**: Deletion may be eventual (one namespace deleted, others follow in next reconciliation cycle). This is acceptable given low-frequency reconciliation. + +--- + +### 9.4 Orphaned Resource Handling + +**Definition**: Orphaned resources occur when E2E flow is interrupted before setting final retention. + +**Handling**: +- No special orphan detection needed +- Default 2-hour retention set at creation covers this case +- Reconciler treats orphans identically to any expired namespace + +**Monitoring**: High orphan rate (inferred from default retention deletions) indicates E2E flow reliability issues. + +--- + +## 10. Testing Infrastructure Considerations + +### 10.1 Image Build and Distribution + +**Image Architecture**: + +Test infrastructure uses two container images with distinct responsibilities: + +1. **Cloud Platform Tools** - Target cluster authentication + - Contains cloud provider CLIs (gcloud, aws, etc.) + - Runs as init container to generate cluster credentials + - Low change frequency (rebuilt only when cloud tooling updates) + +2. **E2E Test Framework** - Infrastructure deployment and test execution + - Contains helm CLI, test code, and deployment charts + - Manages entire Test Run lifecycle + - High change frequency (rebuilt on test code or chart changes) + +**Rationale**: +- Adapter hot-plugging requires deployment tooling in test execution context +- Infrastructure deployment is orchestrated by Test Framework (Section 5.4) +- Separation by change frequency optimizes CI/CD build efficiency + +--- + +## 11. Observability and Debugging + +Debuggability is enabled by: + +- One Namespace per Test Run +- Consistent labeling and naming conventions +- Clear lifecycle boundaries +- Namespace retention on failure +- Component version reporting + +**Version Transparency**: + +Test framework outputs component versions at Test Run start: +- Core components (API, Sentinel, Broker) +- Fixture Adapter +- Production adapters deployed during test execution + +This enables correlation between test results and component versions for failure investigation. + +Engineers can: + +- Inspect all failed-test resources in a single Namespace +- Correlate logs, events, and message flows +- Reproduce failures with exact component versions +- Identify version-specific issues + +--- + +## 12. Open Questions and Follow-Ups + +The following topic is intentionally deferred to implementation phase: + +- What is the minimal functional specification for Fixture Adapter? + +--- + +## 13. Action Items and Next Steps + +**Prerequisites**: This test strategy assumes HyperFleet system supports runtime adapter hot-plugging (dynamic adapter deployment without API/Sentinel restart). If this capability does not exist, it should be implemented as part of HyperFleet core development (separate from E2E framework work). + +### 13.1 Core Infrastructure + +**HYPERFLEET-XXX: Container Image Architecture** +- [ ] Build Cloud Platform Tools image (gcloud, aws cli, kubeconfig generation) +- [ ] Build E2E Test Framework image (helm cli, test code, deployment charts) +- [ ] Set up image build pipeline + +**HYPERFLEET-XXX: Test Run Lifecycle** +- [ ] Implement Test Run ID generation +- [ ] Implement namespace creation with isolation labels (test-run-id, ci, managed-by) +- [ ] Implement infrastructure deployment via helm (API, Sentinel, Broker) +- [ ] Add infrastructure readiness checks +- [ ] Implement cleanup: helm uninstall + namespace deletion + +**HYPERFLEET-XXX: Fixture Adapter** +- [ ] Design minimal functional specification for Fixture Adapter +- [ ] Implement Fixture Adapter with event consumption capability +- [ ] Add Fixture Adapter to infrastructure helm chart +- [ ] Write Fixture Adapter unit tests + +**HYPERFLEET-XXX: Component Version Reporting** +- [ ] Implement version output at Test Run start +- [ ] Output core component versions (API, Sentinel, Broker, Fixture Adapter) +- [ ] Log production adapter versions during test execution + +### 13.2 Test Suite Implementation + +**HYPERFLEET-XXX: Core Suite** +- [ ] Implement Core Suite test cases (cluster/nodepool lifecycle) +- [ ] Verify Core Suite operates with Fixture Adapter only +- [ ] Add infrastructure health validation + +**HYPERFLEET-XXX: Adapter Execution Suite** +- [ ] Implement suite-level adapter deployment (beforeSuite/afterSuite) +- [ ] Write adapter job execution tests +- [ ] Add event handling and error handling validation + +**HYPERFLEET-XXX: Adapter Deployment Suite** +- [ ] Implement per-test-case adapter deployment helpers (helm install/uninstall) +- [ ] Create adapter configuration testdata directory (helm values for adapter deployment) +- [ ] Write adapter deployment and removal validation tests +- [ ] Add cleanup completeness verification + +### 13.3 Documentation + +**HYPERFLEET-XXX: E2E Test Run Strategy Guide** +- [ ] Document Test Run lifecycle for developers +- [ ] Write suite type selection guide (Core/Execution/Deployment) +- [ ] Create adapter hot-plugging examples +- [ ] Document basic cleanup and troubleshooting + +### 13.4 Future Enhancements + +The following enhancements are deferred to post-MVP: + +**HYPERFLEET-XXX: Cross-Namespace Topology** +- [ ] Implement cross-namespace deployment model (e2e-crossns-{ID}-core, e2e-crossns-{ID}-adapters) +- [ ] Add cross-namespace DNS and NetworkPolicy configuration +- [ ] Update cleanup logic for cross-namespace correlation +- [ ] Write cross-namespace communication validation tests +- [ ] Document production deployment model verification use cases + +**HYPERFLEET-XXX: Retention Policy** +- [ ] Implement namespace retention annotation logic +- [ ] Add test result-based retention updates (passed: 10min, failed: 24h/6h) +- [ ] Configure default 2-hour TTL for orphaned namespaces +- [ ] Write retention policy unit tests + +**HYPERFLEET-XXX: Cleanup Reconciler Job** +- [ ] Implement TTL-based namespace reconciler +- [ ] Add orphaned resource detection and cleanup +- [ ] Add cross-namespace correlation for multi-namespace topologies +- [ ] Configure reconciler schedule (30-minute default) +- [ ] Add reconciler monitoring and alerts + +--- + +**Document Status**: Draft for Review +**Next Steps**: Team review and approval, then create implementation tickets From 1ab9c1df0120412846d3c77c7ddfa3ef7e592245 Mon Sep 17 00:00:00 2001 From: yasun Date: Tue, 3 Feb 2026 10:48:30 +0800 Subject: [PATCH 2/2] refactor Test Scenario Organization and enhance Fixture adpater update --- .../e2e-run-strategy-spike-report.md | 518 ++++++++++++------ 1 file changed, 342 insertions(+), 176 deletions(-) diff --git a/hyperfleet/e2e-testing/e2e-run-strategy-spike-report.md b/hyperfleet/e2e-testing/e2e-run-strategy-spike-report.md index 5efa0f8..04d85f7 100644 --- a/hyperfleet/e2e-testing/e2e-run-strategy-spike-report.md +++ b/hyperfleet/e2e-testing/e2e-run-strategy-spike-report.md @@ -115,6 +115,8 @@ Each Test Run generates a unique identifier (Test Run ID) derived from: - Unix timestamp with high entropy - Customized with random components +**Example**: Using `time.Now().UnixNano()` generates a 19-digit number: `1738152345678901234`, resulting in namespace: `e2e-1738152345678901234`. + The Test Run ID is consistently applied to: - Kubernetes Namespaces @@ -159,29 +161,130 @@ Cleanup - All infrastructure components are healthy - Fixture Adapter is operational - Test suites can execute independently -- No production adapters are deployed yet +- No functional adapters are deployed yet **Test Suite Execution**: - Suites execute sequentially -- Each suite may deploy/remove production adapters as needed +- Each suite may deploy/remove functional adapters as needed - Environment state persists across suites within the same Test Run +**Cleanup**: +- Delete cloud messaging resources (topics/subscriptions) tagged with test_run_id via cloud CLI +- Uninstall infrastructure components via helm +- Delete namespace +- See Section 9 for detailed cleanup and retention policy + --- ### 4.4 Fixture Adapter -The **Fixture Adapter** is a minimal test infrastructure component deployed as part of the Test Run. +**Problem**: Core Suite needs to validate HyperFleet framework behavior (event flow, status aggregation, error handling). Functional adapters have external dependencies (cloud APIs, GCP projects) and cannot provide the controlled, repeatable scenarios needed for framework testing. What type of adapter should Core Suite use? -**Purpose**: -- Enables core workflows (cluster/nodepool lifecycle) to complete independently -- Provides stable baseline for adapter-independent testing -- Acts as minimal event consumer for workflow completion +**Decision**: Fixture Adapter in dedicated repository -**Characteristics**: -- Deployed during Test Run setup -- Lifecycle owned by Test Framework -- Not used for business logic validation -- Remains active throughout the Test Run +Build a test-specific adapter (`adapter-fixture` repository) based on hyperfleet-adapter framework that enables controlled testing of framework behaviors without external dependencies. + +**Rationale**: +- **No cloud resources needed**: Core Suite tests framework data flow, doesn't need GCP projects, AWS accounts, or cloud credentials +- **No auth configuration needed**: Eliminates setup complexity and credential management +- **Error injection support**: Can simulate adapter failures, delays, and error conditions for framework testing +- **Fast execution**: No external dependencies enables fast, stable, reproducible tests +- **Real adapter framework**: Uses actual hyperfleet-adapter framework mechanisms (preconditions, resource management, status reporting) + +--- + +#### 4.4.1 Framework Behaviors to Test + +Core Suite validates framework-level behaviors: + +| Framework Behavior | What Core Suite Validates | Fixture Adapter Role | +|-------------------|---------------------------|---------------------| +| **Event flow** | API → Sentinel → Broker → Adapter → Status update | Subscribe to events, report status back | +| **Status aggregation** | Framework merges adapter conditions into resource status | Report different condition combinations (Applied, Available, Health) | +| **Async processing** | Framework waits for adapter status reporting | Introduce controllable delays before status reporting | +| **Error handling** | Framework handles adapter failures | Report failure conditions (Applied=False, Available=False) | +| **Concurrent processing** | Framework handles multiple resources | Process multiple events in parallel | + +--- + +#### 4.4.2 Design Approach + +**Repository**: `openshift-hyperfleet/adapter-fixture` + +**Based on hyperfleet-adapter framework**: +- Uses real adapter mechanisms (preconditions, resources, post-processing, status reporting) +- Subscribes to broker events +- Reports standard status conditions (Applied, Available, Health) +- Deployed once per Test Run as infrastructure component + +**Events consumed**: +- `cluster.created` - New cluster created via API +- `nodepool.created` - New nodepool created + +**Status conditions reported**: +- **Applied**: Resources created successfully? (True/False) +- **Available**: Workload completed successfully? (True/False) +- **Health**: Adapter operating normally? (True/False) + +**Control modes provided**: +- **Immediate success**: Report success immediately (test basic event flow) +- **Delayed success**: Delay N seconds, then report success (test async status aggregation) +- **Failure**: Report failure immediately (test error handling) +- **Transient failure**: Fail N times, then succeed (test retry logic) + +**Test control mechanism**: +- **adapter-fixture provides**: Controllable behavior mechanism per resource (e.g., delay before status report, report failure conditions) + - Implementation approach: resource labels (HyperFleet API supports labels field for metadata) + - Constraint: Must not require adapter reconfiguration or restart between tests +- **e2e tests use**: Create resources with control labels to trigger specific Fixture Adapter behaviors for testing framework responses + +--- + +#### 4.4.3 Complementary Testing Strategy + +**Fixture Adapter + Core Suite**: +- **What**: Framework data flow validation (API → Sentinel → Broker → Adapter → Status reporting) +- **Why Fixture**: No cloud resources needed, no auth configuration needed, supports error injection +- **Focus**: Event flow, status aggregation, error handling + +**Functional Adapters + Adapter Suite**: +- **What**: Adapter implementation validation (configuration loading, K8s resource creation, status reporting) +- **Why Functional**: Tests real adapter logic with actual hyperfleet-adapter framework +- **Focus**: Adapter creates correct K8s resources (Job, Configmap, Namespace, Manifest, etc.) + +--- + +#### 4.4.4 Implementation Priority + +**Fixture Adapter is a new component** that requires design and implementation. To avoid blocking e2e testing progress, a phased approach is recommended: + +**Phase 1 (Immediate): Use adapter-landing-zone for happy-path Core Suite** +- **What**: Use existing adapter-landing-zone as temporary substitute for basic framework testing +- **Coverage**: Happy-path event flow (API → Sentinel → Broker → Adapter → Status reporting) +- **Benefits**: + - No new component development required + - Tests real adapter framework mechanisms + - Creates observable K8s resources (Namespace, ServiceAccount) + - Can use RabbitMQ instead of GCP Pub/Sub (eliminates cloud resource dependency) +- **Limitations**: + - Still requires K8s cluster + kubeconfig (auth configuration) + - Cannot test error injection scenarios (failure handling, retry logic) + - Creates real resources in test cluster + +**Phase 2 (Future): Implement Fixture Adapter for comprehensive Core Suite** +- **What**: Build dedicated adapter-fixture repository +- **Coverage**: Complete framework behavior testing (success, error handling, retry logic, async processing) +- **Benefits**: + - No cloud resources needed + - No auth configuration needed + - Full error injection support + - Comprehensive framework validation + +**Rationale for phased approach**: +- adapter-landing-zone provides immediate value for basic framework testing +- Fixture Adapter implementation can proceed in parallel without blocking e2e progress +- Phase 1 establishes Core Suite structure and CI integration +- Phase 2 expands coverage to error scenarios and removes remaining dependencies --- @@ -212,50 +315,20 @@ This Namespace serves as the hard isolation boundary for: Namespace names follow a consistent pattern for operational clarity: ``` -e2e-singlens-{TEST_RUN_ID} +e2e-{TEST_RUN_ID} ``` **Components**: - `e2e-`: Prefix indicating E2E test resources -- `singlens-`: Topology indicator (standard single-namespace deployment) - `{TEST_RUN_ID}`: Unique test run identifier **Rationale**: -- Deployment model immediately visible from namespace name - Test Run ID enables correlation of resources across test runs - Operational teams can identify E2E namespaces without inspecting labels --- -### 5.3 Cross-Namespace Topology (Advanced) - -For tests requiring production-realistic deployment validation, an **optional cross-namespace topology** is supported: - -``` -e2e-crossns-{TEST_RUN_ID}-core -e2e-crossns-{TEST_RUN_ID}-adapters -``` - -**Structure**: -- `-core` namespace: API, Sentinel, Broker -- `-adapters` namespace: Adapter components -- Both share same `TEST_RUN_ID` for correlation - -**Use Cases**: -- Validating cross-namespace communication (Service DNS, NetworkPolicy) -- Security boundary testing -- Production deployment model verification - -**Trade-offs**: -- Increased setup complexity -- Requires additional RBAC configuration -- Suitable for integration and security validation tests only - -Most E2E tests should use the standard single-namespace topology. - ---- - -### 5.4 Component Lifecycle Ownership +### 5.3 Component Lifecycle Ownership | Component | Lifecycle Owner | Scope | Notes | |--------------------|------------------|--------------|-------| @@ -263,7 +336,7 @@ Most E2E tests should use the standard single-namespace topology. | API | Test Framework | Per Test Run | | | Sentinel | Test Framework | Per Test Run | | | Fixture Adapter | Test Framework | Per Test Run | Infrastructure component | -| Production Adapter | Test Suite | Suite-scoped | Dynamically managed | +| Functional Adapter | Test Suite | Suite-scoped | Dynamically managed | | Broker Resources | Adapter/Sentinel | Per Test Run | | **Rule:** @@ -271,9 +344,9 @@ No component may create resources outside its Test Run Namespace without Test Ru --- -### 5.5 Resource Labeling Strategy +### 5.4 Resource Labeling Strategy -#### 5.5.1 Required Labels +#### 5.4.1 Required Labels All E2E test namespaces must carry exactly three labels: @@ -322,6 +395,15 @@ This avoids: - Subscription reuse race conditions - Message leakage between runs +**Cloud Resource Cleanup**: + +For cloud messaging resources (e.g., GCP Pub/Sub Topics and Subscriptions), the teardown phase explicitly deletes resources tagged with `test_run_id`: + +- Resources are tagged/labeled with Test Run ID during creation +- Teardown script calls cloud CLI to delete tagged resources (e.g., `gcloud pubsub topics/subscriptions delete`) +- Namespace deletion alone does not clean up cloud resources +- This ensures no orphaned cloud resources remain after test completion + --- ## 7. Race Condition Prevention @@ -386,115 +468,175 @@ Test suites represent **validation focus**, not environment configurations. #### 8.2.1 Core Suite -Validates cluster and nodepool workflows using only core components. +Validates HyperFleet framework behavior using Fixture Adapter. + +**Purpose**: Fast, stable testing of framework logic without external dependencies. **Environment**: - Core components (API, Sentinel, Broker) -- Fixture Adapter only (no production adapters) +- Fixture Adapter with label-driven behavior (see Section 4.4 for control modes) +- No functional adapters or real cloud services -**Validates**: -- Cluster lifecycle (create, ready, delete) -- NodePool lifecycle -- Event-driven workflow completion -- Core component behavior under baseline conditions - -**Example flow**: Create cluster → Sentinel publishes event → Fixture Adapter consumes → Reports success → Cluster becomes Ready +**Validates** (framework behavior): +- **Event flow**: API → Sentinel → Broker → Adapter → API +- **Async status aggregation**: Framework waits for adapter responses (test with Fixture delay modes) +- **Error handling**: Framework handles adapter failures (test with Fixture failure modes) +- **Retry logic**: Framework retries failed operations (test with Fixture transient-failure modes) +- **Status reconciliation**: Framework merges adapter conditions into resource status +- **Concurrent processing**: Framework handles multiple resources in parallel +- **Resource lifecycle**: Cluster and NodePool create/update/delete workflows ---- +**Test Approach**: -#### 8.2.2 Adapter Execution Suite +Tests control Fixture Adapter behavior via resource labels to validate framework responses: -Validates adapter runtime behavior and job execution. +| Framework Behavior | Resource Labels | Validation | +|-------------------|---------------------|------------| +| Basic data flow | `behavior: immediate-success` | Cluster reaches Ready state | +| Async aggregation | `behavior: delayed-success`
`delay-seconds: 30` | Framework waits 30s, then aggregates status | +| Error handling | `behavior: failure`
`failure-reason: ValidationFailed` | Cluster enters Failed state with correct reason | +| Retry logic | `behavior: transient-failure`
`failure-count: 3` | Framework retries 3 times, then succeeds | +| Timeout handling | `behavior: timeout` | Framework times out after configured duration | -**Environment**: -- Core components deployed -- Production adapters hot-plugged **at suite level** (beforeSuite/afterSuite) +**Example Flow**: -**Validates**: -- Adapter job execution (e.g., Kubernetes namespace creation) -- Event handling correctness -- Error handling and retries -- Resource reconciliation +``` +1. Test creates Cluster via API with labels: fixture.control/mode=delayed-success, fixture.control/delay-seconds=30 +2. API persists Cluster (including labels) +3. Sentinel polls, detects new Cluster, publishes event +4. Fixture Adapter consumes event, reads labels, waits 30s +5. Fixture Adapter reports success status to API +6. API updates Cluster status +7. Test validates: Cluster phase = Ready (after ~30s) +``` -**Adapter Management**: -- Adapters deployed once in beforeSuite -- Shared across all test cases in the suite -- Removed in afterSuite -- Tests adapter runtime behavior, not deployment process +**Characteristics**: +- ✅ Fast execution: No external dependencies +- ✅ Stable: Infrastructure never reconfigured, 100% reproducible +- ✅ Comprehensive: Tests all framework behaviors via label-driven Fixture Adapter --- -#### 8.2.3 Adapter Deployment Suite +#### 8.2.2 Adapter Suite + +Validates functional adapter deployment, implementation, and lifecycle management. -Validates adapter installation, configuration, and removal correctness. +**Purpose**: Test real adapter functionality with complete deployment → function → cleanup validation. **Environment**: - Core components deployed -- Production adapters hot-plugged **per test case** +- Functional adapters hot-plugged with **flexible deployment granularity** (managed by test groups or individual test cases) **Validates**: -- Adapter deployment process -- Configuration correctness -- Subscription registration -- Adapter health and readiness -- Adapter removal and cleanup -- Resource cleanup completeness - -**Adapter Management**: -- Each test case deploys and removes its own adapter instance -- Tests the complete deployment/teardown lifecycle -- Enables testing of various adapter configurations - ---- +- Adapter deployment and configuration loading +- Adapter logic (K8s resource creation: Job, ConfigMap, etc.) +- Error handling and retry logic +- Status reporting (conditions, reasons, messages) +- Adapter removal and cleanup completeness -### 8.3 Adapter Lifecycle Management +**Adapter Management Decision**: -Production adapters are **dynamically managed** within a Test Run: +We evaluated two deployment granularities for managing adapter lifecycle: -**Hot-plugging**: -- Adapters can be added or removed between suites or test cases -- Multiple adapters can be deployed in parallel -- Each adapter maintains independent subscriptions +| Approach | Adapter Scope | When Deployed | When Removed | Trade-offs | +|----------|---------------|---------------|--------------|------------| +| **Test Group-level** (Ordered + BeforeAll/AfterAll) | Shared within a test group | Once per test group | After all tests in group | ✅ Faster (deploy once)
✅ Good for read-only tests
⚠️ Tests in group share adapter state | +| **Test Case-level** (BeforeEach/AfterEach) | Isolated per individual test | Before each test case | After each test case | ✅ Complete isolation
✅ No state pollution
❌ Slower (deploy per test) | -**Ownership**: -- Test Suite owns adapter lifecycle within its scope -- Adapters are treated as test variables, not infrastructure constants +**Decision: Support both granularities within Adapter Suite** -**Subscription Management**: -- Each adapter creates unique subscriptions -- Subscription IDs ensure isolation between adapter instances +**Rationale**: +- Different test types have different isolation needs +- Read-only validations (e.g., check DNS records) benefit from Test Group-level sharing +- State-changing tests (e.g., error injection, config changes) require Test Case-level isolation +- Mixed approach optimizes for both speed and test quality -**Independence**: -- Core Suite operates independently via Fixture Adapter -- Adapter failures do not impact infrastructure stability +**Test Organization**: +- Test groups use scoped `Describe` blocks to control adapter lifecycle +- Test Group-level: `Describe` + `Ordered` + `BeforeAll`/`AfterAll` +- Test Case-level: `Describe` + `BeforeEach`/`AfterEach` +- Multiple test groups can coexist with different strategies --- -### 8.4 Suite Execution Order +### 8.3 Suite Execution Order **Recommended Order**: Within a Test Run, suites typically execute in this order: -1. **Core Suite** - Validates baseline functionality (must run first) -2. **Adapter Execution Suite** - Validates adapter runtime behavior -3. **Adapter Deployment Suite** - Validates adapter deployment process +1. **Core Suite** - Validates framework data flow +2. **Adapter Suite** - Validates functional adapter implementation **Rationale**: -- Core Suite must run first to validate infrastructure readiness -- Adapter Execution and Deployment suites have no dependency on each other -- Adapter Execution Suite runs before Deployment Suite to minimize environment pollution: - - Execution Suite manages adapters at suite level (cleaner state isolation) - - Deployment Suite creates/removes adapters per test case (higher churn) +- Core Suite runs first to validate infrastructure readiness +- Core Suite provides fast feedback (framework-level issues) +- Adapter Suite tests functional adapter implementation after infrastructure validated **Flexibility**: -- Adapter Execution and Deployment suites can run in either order or in parallel (separate Test Runs) -- Any suite can run independently if infrastructure is ready -- Order recommendation optimizes for state cleanliness, not correctness +- Suites can run independently if infrastructure is ready +- Multiple Test Runs can execute in parallel, each isolated in separate namespace (e2e-{TEST_RUN_ID}) --- -### 8.5 State Management and Suite Independence +### 8.4 Test Organization Guidelines + +**Problem**: When should tests use Test Group-level vs Test Case-level adapter deployment? + +**Decision Matrix**: + +| Test Characteristics | Recommended Strategy | Rationale | +|---------------------|---------------------|-----------| +| Read-only validations (check records, metrics, status) | Test Group-level | Tests don't interfere, share setup cost | +| Independent functional checks (no state changes) | Test Group-level | Can reuse adapter safely | +| Error injection scenarios | Test Case-level | State contamination risk, need fresh adapter | +| Configuration variations | Test Case-level | Different adapter configs required | +| State-changing operations (update, delete) | Test Case-level | Side effects prevent reuse | + +**Conceptual Structure**: + +``` +Describe("DNS Adapter Suite", func() { + + // Test Group 1: Shared adapter (Test Group-level) + Describe("Deployment Validation", Ordered, func() { + var adapter *Adapter + BeforeAll: Deploy adapter once + It: Test deployment correctness + It: Test configuration loading + It: Test subscription registration + AfterAll: Remove adapter + }) + + // Test Group 2: Shared adapter (Test Group-level) + Describe("Functional Tests", Ordered, func() { + var adapter *Adapter + BeforeAll: Deploy adapter + create test data + It: Validate DNS record creation + It: Validate status reporting + It: Validate metrics + AfterAll: Cleanup test data + remove adapter + }) + + // Test Group 3: Isolated adapters (Test Case-level) + Describe("Error Scenarios", func() { + var adapter *Adapter + BeforeEach: Deploy fresh adapter + It: Test error handling (invalid domain) + It: Test retry logic + AfterEach: Remove adapter + }) +}) +``` + +**Key Principles**: +- **Test Group-level** (Describe + Ordered + BeforeAll/AfterAll): Group related read-only tests within a focused Describe block +- **Test Case-level** (Describe + BeforeEach/AfterEach): Isolate state-changing or error tests +- **Scoped test groups**: Each Describe block defines a focused scope for adapter lifecycle management + +--- + +### 8.6 State Management and Suite Independence **State Ownership Model**: @@ -503,20 +645,23 @@ Test Run state is categorized by lifetime and ownership: | State Type | Lifetime | Owner | Examples | |------------|----------|-------|----------| | Infrastructure State | Test Run | Test Framework | Namespace, API, Sentinel, Fixture Adapter | -| Adapter State | Suite or Test Case | Test Suite | Production adapter pods, subscriptions | +| Adapter State | Test Group or Test Case | Test Group (Describe block) | Functional adapter pods, subscriptions | | Test Data | Test Case | Test Case | Clusters, NodePools, test-specific resources | **Isolation Principles**: 1. **Infrastructure persists** - Core components remain active throughout the Test Run -2. **Adapters are ephemeral** - Created and removed by test suites as needed +2. **Adapters are ephemeral** - Created and removed by test groups (Describe blocks) or individual test cases 3. **Test data is scoped** - Each test case manages its own test resources 4. **Unique naming prevents collision** - Resources use unique identifiers to avoid cross-test interference **Suite Independence**: - Suites can run independently if infrastructure is ready -- Suite failures do not block subsequent suites (collect all failures) +- Suite execution strategy: + - **Fail-fast**: Core suite failures (API, Sentinel, Broker) block dependent suites + - **Fail-tolerant**: Independent suite failures are collected without blocking independent suites + - Ensures early termination on infrastructure failures while maximizing test coverage - Each suite validates its prerequisites at startup **Cleanup Responsibility**: @@ -593,19 +738,9 @@ A scheduled reconciler job enforces TTL-based cleanup: **Simplicity Principle**: Reconciler does not distinguish between: - Normal vs orphaned namespaces - CI vs local runs -- Single-namespace vs cross-namespace All policy decisions are encoded in namespace annotations. Reconciler is stateless. -#### 9.3.2 Cross-Namespace Correlation - -For cross-namespace deployments, reconciler must correlate related namespaces: -- Identifies topology from namespace naming convention -- Finds all namespaces sharing the same Test Run ID -- Deletes correlated namespaces together - -**Atomicity**: Deletion may be eventual (one namespace deleted, others follow in next reconciliation cycle). This is acceptable given low-frequency reconciliation. - --- ### 9.4 Orphaned Resource Handling @@ -621,6 +756,28 @@ For cross-namespace deployments, reconciler must correlate related namespaces: --- +### 9.5 Cloud Resource Cleanup + +**Scope**: Cloud messaging resources (GCP Pub/Sub Topics and Subscriptions) require explicit cleanup beyond namespace deletion. + +**Cleanup Process**: + +1. **Tagging**: All cloud resources created during Test Run are tagged/labeled with `test_run_id` +2. **Teardown**: E2E cleanup script explicitly deletes cloud resources via cloud CLI: + - List resources filtered by `test_run_id` tag + - Delete Topics and Subscriptions matching the Test Run ID + - Example: `gcloud pubsub topics delete --filter="labels.test_run_id=1738152345678901234"` +3. **Reconciler**: Periodically scans for orphaned cloud resources (tagged but older than retention TTL) and deletes them + +**Why Explicit Cleanup**: +- Kubernetes namespace deletion does not remove cloud resources +- Cloud resources incur costs and quota consumption +- Orphaned cloud resources can accumulate over time + +**Implementation Note**: Cloud resource cleanup happens before namespace deletion in teardown sequence to ensure cleanup script has cluster access. + +--- + ## 10. Testing Infrastructure Considerations ### 10.1 Image Build and Distribution @@ -660,10 +817,10 @@ Debuggability is enabled by: Test framework outputs component versions at Test Run start: - Core components (API, Sentinel, Broker) -- Fixture Adapter -- Production adapters deployed during test execution +- Adapter (adapter-landing-zone in Phase 1, Fixture Adapter in Phase 2) +- Functional adapters deployed during test execution (Phase 2) -This enables correlation between test results and component versions for failure investigation. +Version information is logged during infrastructure deployment phase, enabling correlation between test results and component versions for failure investigation. Engineers can: @@ -676,17 +833,19 @@ Engineers can: ## 12. Open Questions and Follow-Ups -The following topic is intentionally deferred to implementation phase: - -- What is the minimal functional specification for Fixture Adapter? +No open questions at this time. Fixture Adapter design is covered in Section 4.4. --- ## 13. Action Items and Next Steps -**Prerequisites**: This test strategy assumes HyperFleet system supports runtime adapter hot-plugging (dynamic adapter deployment without API/Sentinel restart). If this capability does not exist, it should be implemented as part of HyperFleet core development (separate from E2E framework work). +Implementation follows a phased approach to avoid blocking e2e testing progress (see Section 4.4.4 for Fixture Adapter phasing strategy). -### 13.1 Core Infrastructure +--- + +### 13.1 Phase 1: MVP with adapter-landing-zone (Immediate) + +**Goal**: Establish e2e testing infrastructure with happy-path Core Suite validation using existing adapter-landing-zone. **HYPERFLEET-XXX: Container Image Architecture** - [ ] Build Cloud Platform Tools image (gcloud, aws cli, kubeconfig generation) @@ -696,58 +855,65 @@ The following topic is intentionally deferred to implementation phase: **HYPERFLEET-XXX: Test Run Lifecycle** - [ ] Implement Test Run ID generation - [ ] Implement namespace creation with isolation labels (test-run-id, ci, managed-by) -- [ ] Implement infrastructure deployment via helm (API, Sentinel, Broker) +- [ ] Implement infrastructure deployment via helm (API, Sentinel, Broker, adapter-landing-zone) +- [ ] Configure adapter-landing-zone with RabbitMQ broker (no GCP dependency) - [ ] Add infrastructure readiness checks -- [ ] Implement cleanup: helm uninstall + namespace deletion +- [ ] Output component versions at Test Run start (API, Sentinel, Broker, adapter-landing-zone) +- [ ] Implement cleanup: cloud resource deletion (topics/subscriptions tagged with test_run_id) + helm uninstall + namespace deletion -**HYPERFLEET-XXX: Fixture Adapter** -- [ ] Design minimal functional specification for Fixture Adapter -- [ ] Implement Fixture Adapter with event consumption capability -- [ ] Add Fixture Adapter to infrastructure helm chart -- [ ] Write Fixture Adapter unit tests +**HYPERFLEET-XXX: Core Suite (Phase 1 - with adapter-landing-zone)** +- [ ] Implement Core Suite test cases (happy-path cluster/nodepool lifecycle) +- [ ] Validate framework data flow: API → Sentinel → Broker → Adapter → API +- [ ] Add infrastructure health validation -**HYPERFLEET-XXX: Component Version Reporting** -- [ ] Implement version output at Test Run start -- [ ] Output core component versions (API, Sentinel, Broker, Fixture Adapter) -- [ ] Log production adapter versions during test execution +**HYPERFLEET-XXX: E2E Test Run Strategy Guide (Phase 1)** +- [ ] Document Test Run lifecycle for developers +- [ ] Document Core Suite basics +- [ ] Document basic cleanup and troubleshooting -### 13.2 Test Suite Implementation +--- -**HYPERFLEET-XXX: Core Suite** -- [ ] Implement Core Suite test cases (cluster/nodepool lifecycle) -- [ ] Verify Core Suite operates with Fixture Adapter only -- [ ] Add infrastructure health validation +### 13.2 Phase 2: Fixture Adapter and Adapter Suite (Future) -**HYPERFLEET-XXX: Adapter Execution Suite** -- [ ] Implement suite-level adapter deployment (beforeSuite/afterSuite) -- [ ] Write adapter job execution tests -- [ ] Add event handling and error handling validation +**Goal**: Implement Fixture Adapter for comprehensive framework testing and Adapter Suite for functional adapter validation. -**HYPERFLEET-XXX: Adapter Deployment Suite** -- [ ] Implement per-test-case adapter deployment helpers (helm install/uninstall) -- [ ] Create adapter configuration testdata directory (helm values for adapter deployment) -- [ ] Write adapter deployment and removal validation tests +**Prerequisites**: Requires HyperFleet system to support runtime adapter hot-plugging (dynamic adapter deployment without API/Sentinel restart). + +**HYPERFLEET-XXX: Fixture Adapter** +- [ ] Implement Fixture Adapter in dedicated repository (adapter-fixture) +- [ ] Implement label-driven control modes (immediate success, delayed success, failure, transient failure) +- [ ] Implement event consumption (cluster.created, nodepool.created) +- [ ] Implement status reporting (Applied, Available, Health conditions) +- [ ] Add Fixture Adapter to infrastructure helm chart +- [ ] Write Fixture Adapter unit tests + +**HYPERFLEET-XXX: Core Suite (Phase 2 - with Fixture Adapter)** +- [ ] Replace adapter-landing-zone with Fixture Adapter in Core Suite +- [ ] Implement error injection tests (failure handling, retry logic) +- [ ] Implement async processing tests (delayed status reporting) +- [ ] Implement concurrent processing tests +- [ ] Update Core Suite documentation with Fixture Adapter usage + +**HYPERFLEET-XXX: Adapter Suite** +- [ ] Implement flexible adapter deployment strategies (Ordered + BeforeAll, BeforeEach/AfterEach) +- [ ] Create adapter configuration testdata directory (different adapter configs for testing) +- [ ] Write adapter deployment validation tests (config loading, subscription registration) +- [ ] Write adapter implementation tests (K8s resource creation: Job, ServiceAccount, ConfigMap, etc.) +- [ ] Add error handling and retry logic validation - [ ] Add cleanup completeness verification +- [ ] Implement mixed strategy examples (read-only vs state-changing tests) -### 13.3 Documentation +**HYPERFLEET-XXX: E2E Test Run Strategy Guide (Phase 2)** +- [ ] Write suite organization guide (Core Suite vs Adapter Suite) +- [ ] Document test organization strategies (Ordered + BeforeAll, BeforeEach/AfterEach, mixed approach) +- [ ] Create adapter deployment strategy examples -**HYPERFLEET-XXX: E2E Test Run Strategy Guide** -- [ ] Document Test Run lifecycle for developers -- [ ] Write suite type selection guide (Core/Execution/Deployment) -- [ ] Create adapter hot-plugging examples -- [ ] Document basic cleanup and troubleshooting +--- -### 13.4 Future Enhancements +### 13.3 Post-MVP Enhancements The following enhancements are deferred to post-MVP: -**HYPERFLEET-XXX: Cross-Namespace Topology** -- [ ] Implement cross-namespace deployment model (e2e-crossns-{ID}-core, e2e-crossns-{ID}-adapters) -- [ ] Add cross-namespace DNS and NetworkPolicy configuration -- [ ] Update cleanup logic for cross-namespace correlation -- [ ] Write cross-namespace communication validation tests -- [ ] Document production deployment model verification use cases - **HYPERFLEET-XXX: Retention Policy** - [ ] Implement namespace retention annotation logic - [ ] Add test result-based retention updates (passed: 10min, failed: 24h/6h) @@ -757,7 +923,7 @@ The following enhancements are deferred to post-MVP: **HYPERFLEET-XXX: Cleanup Reconciler Job** - [ ] Implement TTL-based namespace reconciler - [ ] Add orphaned resource detection and cleanup -- [ ] Add cross-namespace correlation for multi-namespace topologies +- [ ] Add orphaned cloud resource cleanup (topics/subscriptions filtered by test_run_id tag) - [ ] Configure reconciler schedule (30-minute default) - [ ] Add reconciler monitoring and alerts