-
Notifications
You must be signed in to change notification settings - Fork 584
OCPEDGE-2084: Add PacemakerStatus CRD for two-node fencing #2544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,148 @@ | ||
| # etcd.openshift.io API Group | ||
|
|
||
| This API group contains CRDs related to etcd cluster management in Two Node OpenShift with Fencing deployments. | ||
|
|
||
| ## API Versions | ||
|
|
||
| ### v1alpha1 | ||
|
|
||
| Contains the `PacemakerCluster` custom resource for monitoring Pacemaker cluster health in Two Node OpenShift with Fencing deployments. | ||
|
|
||
| #### PacemakerCluster | ||
|
|
||
| - **Feature Gate**: `DualReplica` | ||
| - **Component**: `two-node-fencing` | ||
| - **Scope**: Cluster-scoped singleton resource (must be named "cluster") | ||
| - **Resource Path**: `pacemakerclusters.etcd.openshift.io` | ||
|
|
||
| The `PacemakerCluster` resource provides visibility into the health and status of a Pacemaker-managed cluster. | ||
| It is periodically updated by the cluster-etcd-operator's status collector. | ||
|
|
||
| ### Pacemaker Resources | ||
|
|
||
| A **pacemaker resource** is a unit of work managed by pacemaker. In pacemaker terminology, resources are services | ||
| or applications that pacemaker monitors, starts, stops, and moves between nodes to maintain high availability. | ||
|
|
||
| For Two Node OpenShift with Fencing, we manage three resources: | ||
| - **Kubelet**: The Kubernetes node agent and a prerequisite for etcd | ||
| - **Etcd**: The distributed key-value store | ||
| - **FencingAgent**: Used to isolate failed nodes during a quorum loss event | ||
|
|
||
| ### Status Structure | ||
|
|
||
| > **Note on optional fields:** Per Kubernetes API conventions, all status fields are marked `+optional` in the Go | ||
| > type definitions to allow for partial updates and backward compatibility. However, CEL validation rules enforce | ||
| > that certain fields (like conditions and lastUpdated) cannot be removed once set. This means these fields are | ||
| > "optional on initial creation, but required once present." | ||
| > See: [kube-api-linter statusoptional](https://github.com/kubernetes-sigs/kube-api-linter/blob/main/pkg/analysis/statusoptional/doc.go) | ||
|
|
||
| ```yaml | ||
| status: | ||
| conditions: # Cluster-level conditions (required once set, min 3 items) | ||
| - type: Healthy | ||
| - type: InService | ||
| - type: NodeCountAsExpected | ||
| lastUpdated: <timestamp> # When status was last updated (required once set, cannot decrease) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a way to be confident that things are still working without a clock that will cause writes every 30 seconds? Node's in Kube have similar properties that haven't aged well, mainly because of scale reasons which I guess is less of an issue here, but still, wonder how else we could be confident things were still good without a ticking clock
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I asked Claude to scan through the existing APIs and review our options for this. The best it could come up with was moving the staleness check into the health checker: But this still has the problem that if the status is healthy / stable for > 5 minutes, we don't record a heartbeat, so the status becomes stale. The core objective of the API is to prevent users from entering a dangerous state without a warning provided in the cluster. I think the only way to achieve this is with some kind of heartbeat. The lastUpdated mechanism is simplest I could come up with.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And the lastUpdated will tick every time you reconcile, not every time there is a particular change, right? |
||
| nodes: # Per-node status (0-32 nodes, expects 2) | ||
| - name: <hostname> # RFC 1123 subdomain name | ||
| ipAddresses: # List of global unicast IPv4 or IPv6 addresses in canonical form | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In Nodes, these are just Also, maybe we want to copy the structure where we can add details to addresses later. Would mean making this a list of struct
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updating to use this approach. |
||
| - <ip> # First address used for etcd peer URLs | ||
| conditions: # Node-level conditions (required once set, min 7 items) | ||
| - type: Healthy | ||
| - type: Online | ||
| - type: InService | ||
| - type: Active | ||
| - type: Ready | ||
| - type: Clean | ||
| - type: Member | ||
| resources: # Array of pacemaker resources on this node (required once set, min 3 items) | ||
| - name: Kubelet # All three resources (Kubelet, Etcd, FencingAgent) must be present | ||
| conditions: # Resource-level conditions (required once set, min 8 items) | ||
| - type: Healthy | ||
| - type: InService | ||
| - type: Managed | ||
| - type: Enabled | ||
| - type: Operational | ||
| - type: Active | ||
| - type: Started | ||
| - type: Schedulable | ||
| - name: Etcd | ||
| conditions: [] | ||
| - name: FencingAgent | ||
| conditions: [] | ||
| ``` | ||
|
|
||
| ### Cluster-Level Conditions | ||
|
|
||
| **All three conditions are required once the status is populated.** | ||
|
|
||
| | Condition | True | False | | ||
| |-----------|------|-------| | ||
| | `Healthy` | Cluster is healthy (`ClusterHealthy`) | Cluster has issues (`ClusterUnhealthy`) | | ||
| | `InService` | In service (`InService`) | In maintenance (`InMaintenance`) | | ||
| | `NodeCountAsExpected` | Node count is as expected (`AsExpected`) | Wrong count (`InsufficientNodes`, `ExcessiveNodes`) | | ||
|
|
||
| ### Node-Level Conditions | ||
|
|
||
| **All seven conditions are required for each node once the status is populated.** | ||
|
|
||
| | Condition | True | False | | ||
| |-----------|------|-------| | ||
| | `Healthy` | Node is healthy (`NodeHealthy`) | Node has issues (`NodeUnhealthy`) | | ||
| | `Online` | Node is online (`Online`) | Node is offline (`Offline`) | | ||
| | `InService` | In service (`InService`) | In maintenance (`InMaintenance`) | | ||
| | `Active` | Node is active (`Active`) | Node is in standby (`Standby`) | | ||
| | `Ready` | Node is ready (`Ready`) | Node is pending (`Pending`) | | ||
| | `Clean` | Node is clean (`Clean`) | Node is unclean (`Unclean`) | | ||
| | `Member` | Node is a member (`Member`) | Not a member (`NotMember`) | | ||
|
|
||
| ### Resource-Level Conditions | ||
|
|
||
| Each resource in the `resources` array has its own conditions. **All eight conditions are required for each resource once the status is populated.** | ||
|
|
||
| | Condition | True | False | | ||
| |-----------|------|-------| | ||
| | `Healthy` | Resource is healthy (`ResourceHealthy`) | Resource has issues (`ResourceUnhealthy`) | | ||
| | `InService` | In service (`InService`) | In maintenance (`InMaintenance`) | | ||
| | `Managed` | Managed by pacemaker (`Managed`) | Not managed (`Unmanaged`) | | ||
| | `Enabled` | Resource is enabled (`Enabled`) | Resource is disabled (`Disabled`) | | ||
| | `Operational` | Resource is operational (`Operational`) | Resource has failed (`Failed`) | | ||
| | `Active` | Resource is active (`Active`) | Resource is not active (`Inactive`) | | ||
| | `Started` | Resource is started (`Started`) | Resource is stopped (`Stopped`) | | ||
| | `Schedulable` | Resource is schedulable (`Schedulable`) | Resource is not schedulable (`Unschedulable`) | | ||
|
|
||
| ### Validation Rules | ||
|
|
||
| **Resource naming:** | ||
| - Resource name must be "cluster" (singleton) | ||
|
|
||
| **Node name validation:** | ||
| - Must be a lowercase RFC 1123 subdomain name | ||
| - Consists of lowercase alphanumeric characters, '-' or '.' | ||
| - Must start and end with an alphanumeric character | ||
| - Maximum 253 characters | ||
|
|
||
| **IP address validation:** | ||
| - Pacemaker allows multiple IP addresses for Corosync communication between nodes (1-8 addresses) | ||
| - The first address in the list is used for IP-based peer URLs for etcd membership | ||
| - Each address must be in canonical form (e.g., `192.168.1.1` not `192.168.001.001`, or `2001:db8::1` not `2001:0db8::1`) | ||
| - Each address must be a valid global unicast IPv4 or IPv6 address (including private/RFC1918 addresses) | ||
| - Excludes loopback, link-local, and multicast addresses | ||
|
|
||
| **Timestamp validation:** | ||
| - `lastUpdated` is optional on initial creation, but once set it cannot be removed and must always increase (prevents stale updates) | ||
|
|
||
| **Required conditions (once status is populated):** | ||
| - All cluster-level conditions must be present (MinItems=3) | ||
| - All node-level conditions must be present for each node (MinItems=7) | ||
| - All resource-level conditions must be present for each resource in the `resources` array (MinItems=8) | ||
|
|
||
| **Resource names:** | ||
| - Valid values are: `Kubelet`, `Etcd`, `FencingAgent` | ||
| - All three resources must be present in each node's `resources` array | ||
|
|
||
| ### Usage | ||
|
|
||
| The cluster-etcd-operator healthcheck controller watches this resource and updates operator conditions based on | ||
| the cluster state. The aggregate `Healthy` conditions at each level (cluster, node, resource) provide a quick | ||
| way to determine overall health. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| package etcd | ||
|
|
||
| import ( | ||
| "k8s.io/apimachinery/pkg/runtime" | ||
| "k8s.io/apimachinery/pkg/runtime/schema" | ||
|
|
||
| v1alpha1 "github.com/openshift/api/etcd/v1alpha1" | ||
| ) | ||
|
|
||
| const ( | ||
| GroupName = "etcd.openshift.io" | ||
| ) | ||
|
|
||
| var ( | ||
| schemeBuilder = runtime.NewSchemeBuilder(v1alpha1.Install) | ||
| // Install is a function which adds every version of this group to a scheme | ||
| Install = schemeBuilder.AddToScheme | ||
| ) | ||
|
|
||
| func Resource(resource string) schema.GroupResource { | ||
| return schema.GroupResource{Group: GroupName, Resource: resource} | ||
| } | ||
|
|
||
| func Kind(kind string) schema.GroupKind { | ||
| return schema.GroupKind{Group: GroupName, Kind: kind} | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| .PHONY: test | ||
| test: | ||
| make -C ../../tests test GINKGO_EXTRA_ARGS=--focus="etcd.openshift.io/v1alpha1" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| // +k8s:deepcopy-gen=package,register | ||
| // +k8s:defaulter-gen=TypeMeta | ||
| // +k8s:openapi-gen=true | ||
| // +openshift:featuregated-schema-gen=true | ||
| // +groupName=etcd.openshift.io | ||
| package v1alpha1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| package v1alpha1 | ||
|
|
||
| import ( | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| "k8s.io/apimachinery/pkg/runtime" | ||
| "k8s.io/apimachinery/pkg/runtime/schema" | ||
| ) | ||
|
|
||
| var ( | ||
| GroupName = "etcd.openshift.io" | ||
| GroupVersion = schema.GroupVersion{Group: GroupName, Version: "v1alpha1"} | ||
| schemeBuilder = runtime.NewSchemeBuilder(addKnownTypes) | ||
| // Install is a function which adds this version to a scheme | ||
| Install = schemeBuilder.AddToScheme | ||
|
|
||
| // SchemeGroupVersion generated code relies on this name | ||
| // Deprecated | ||
| SchemeGroupVersion = GroupVersion | ||
| // AddToScheme exists solely to keep the old generators creating valid code | ||
| // DEPRECATED | ||
| AddToScheme = schemeBuilder.AddToScheme | ||
| ) | ||
|
|
||
| // Resource generated code relies on this being here, but it logically belongs to the group | ||
| // DEPRECATED | ||
| func Resource(resource string) schema.GroupResource { | ||
| return schema.GroupResource{Group: GroupName, Resource: resource} | ||
| } | ||
|
|
||
| func addKnownTypes(scheme *runtime.Scheme) error { | ||
| metav1.AddToGroupVersion(scheme, GroupVersion) | ||
|
|
||
| scheme.AddKnownTypes(GroupVersion, | ||
| &PacemakerCluster{}, | ||
| &PacemakerClusterList{}, | ||
| ) | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are dropping this requirement, it's better to be able to actually make the fields required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean I should bump the linter in the go.mod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make that bump yes, if you have time to do it, go for it, else I'll try get to it in the next few days