PersistentVolume Sync Operator

The PersistentVolume Sync Operator externalizes PV metadata to an S3-compatible backend, decoupling storage identity from the cluster. This creates a centralized source of truth for reconstructing volumes across any cluster sharing the underlying storage (e.g., NAS, GFS, SDS).

The operator mirrors PersistentVolume definitions and lifecycle states by capturing exact specifications and volume handles from the Protected cluster. These manifests are then used to reconstruct identical PV objects within a Recovery cluster. This mechanism ensures that during a disaster recovery event, the recovery site has immediate, pre-configured access to the underlying storage, significantly reducing Recovery Time Objectives (RTO) by eliminating manual storage re-provisioning.

Operator Logic: Metadata Synchronization and Storage Mapping

The PersistentVolume Sync Operator facilitates cross-cluster disaster recovery by synchronizing the state and specifications of Kubernetes storage resources. Its behavior is divided into two distinct logical paths:

Resource Specification Capture (Sync)

The operator performs continuous, point-in-time captures of Kubernetes resource specifications for all targeted PersistentVolumes (PV) and PersistentVolumeClaims (PVC).

Intelligent Scoping: It monitors resources across designated StorageClasses, maintaining a comprehensive map of the source cluster's storage landscape.
State Capture Mechanism: The operator serializes metadata, labels, and specific volume requirements (capacity, access modes, and volume handles) into a portable, cluster-agnostic format.
Preservation of Intent: By capturing the full lifecycle state, the operator ensures that the original storage configuration is preserved and ready for immediate reconstitution on the recovery cluster.

StorageClass Decoupling (Heterogeneous Environment Support)

The operator explicitly decouples the PersistentVolumeClaims from the StorageClass definitions. While it captures the metadata for PVCs tied to various classes, it does not recreate the StorageClass objects on the recovery cluster.

Architectural Intent: This is a deliberate design choice to support Heterogeneous Storage Environments. In many disaster recovery scenarios, the infrastructure at the recovery site differs from the primary site. For example, the recovery cluster may utilize a localized file cache or a different storage endpoint to optimize performance or adhere to site-specific infrastructure constraints.
Flexible Binding Logic: By synchronizing only the PV/PVC metadata and omitting the StorageClass, the operator enables the recovery cluster's local storage controller to handle the binding process. This allows the restored claims to be dynamically mapped to the appropriate local backend while maintaining the data's integrity and volume handles.
Site-Specific Optimization: This decoupling ensures that storage policies (such as replication factors or IOPS limits) can be tailored to the recovery site's specific capabilities without needing to mirror the primary site's configuration exactly.

Recovery Workflow

Metadata Extraction: The operator scans the source cluster for labeled storage resources and captures their specifications.
Transformation: Using the clean_metadata logic, the operator strips environment-specific internal annotations while preserving the core volume requirements.
Local Re-Binding: On the recovery cluster, the operator recreates the PVCs. These claims then automatically target the pre-existing local StorageClasses defined on the recovery site, ensuring the data is served via the correct local file cache or storage provider.

Key Benefit: This approach provides a "Clean Slate" recovery where storage logic is kept local to the cluster, preventing the migration of invalid or incompatible storage provider configurations from the primary site.

Supported Storage Architectures

The PersistentVolume Sync Operator is designed for storage backends that provide cross-cluster data accessibility. For the operator to successfully reconstruct volumes on a recovery site, the underlying data must be reachable by the nodes in the Recovery cluster using the same volume handles captured from the Protected cluster.

Network Attached Storage (NAS)

NAS is the most common use case for the operator. Because NAS protocols provide a global namespace, the metadata (path/IP) remains valid across different clusters.

Protocols: NFS, SMB/CIFS.
Featured Support: Ctera GFS. Since Ctera provides a global file system, it is uniquely suited for this operator, allowing volumes to be mounted as RWX (ReadWriteMany) across geographic boundaries.
Recovery Requirement: The Recovery cluster must have network routability to the same storage endpoints.

Storage Area Network (SAN) & Block Storage

Using the operator with SAN backends (like IBM SVC) requires an additional layer to ensure data is present at both sites.

Clustered File Systems: To achieve RWX on SAN storage, a clustered filesystem such as IBM Spectrum Scale (GPFS) should be used. The operator can sync the metadata for these volumes, provided the CSI driver handles are consistent.
Replication Requirement: The SAN LUNs must be replicated at the hardware level (e.g., via IBM HyperSwap or Global Mirror).
Volume Handles: The operator captures the LUN UID. The Recovery cluster must be able to "see" the same LUN UID via its local Fibre Channel or iSCSI fabric.

Compatibility Matrix

Storage Category	Technology Example	Supported	Metadata Sync Key	Requirements for Success
SAN (Enterprise)	IBM SVC + Spectrum Scale	Yes	LUN UID (WWN)	Hardware-level replication (HyperSwap/Global Mirror).
Standard NAS	NFS / SMB	Yes	Export Path & IP/DNS	Network routability to the same storage endpoint.
Standard NAS	NetApp ONTAP	Yes	Volume UUID / Junction Path	SnapMirror or MetroCluster configured between sites.
Standard NAS	Dell PowerScale (Isilon)	Yes	File System ID / SmartConnect	SyncIQ replication and DNS-based SmartConnect availability.
Global File System	Ctera GFS	Yes	File Path / Share ID	Active global namespace across both sites.
Global File System	NetApp Global File Cache	Yes	Cache ID / Backend Volume	Backend ONTAP storage reachable with consistent cache coherency.
Global File System	Microsoft Azure Files (Premium)	Yes	Share Name / Storage Account	Cross-region replication (GZRS) or paired-region failover.
Distributed SDS	Ceph (CephFS)	Yes	Monitor IPs & FS Path	Recovery cluster must have access to the Ceph Monitor/OSD network.
Distributed SDS	Red Hat OpenShift Data Foundation	Yes	StorageClass / FSID	Stretch or mirrored clusters with quorum maintained.
Cloud Native	Longhorn	Yes	Engine Name / Frontend	Longhorn “Disaster Recovery Volumes” or cross-cluster backend.
Cloud Native	Portworx (Pure Storage PX)	Yes	Volume ID / Cluster UUID	PX-DR or Stork-based replication and scheduler awareness.

Implementation Note: RWX vs. RWO

While the operator can technically sync RWO (ReadWriteOnce) volumes, it is most effective for RWX (ReadWriteMany) workloads. For RWO block storage, ensure that the source cluster has fully released the volume (SCSI reservations) before the Recovery cluster attempts to reconstruct and mount it, otherwise, the mount operation will fail at the infrastructure level.

Features

🔄 Cluster-wide PV discovery (no namespace restrictions)
☁️ Backend-agnostic object storage support (Azure Blob, S3, MinIO, Cloudian)
📤 Export storage definitions from the Protected cluster to object storage
📥 Recreate PV objects on the Recovery cluster pointing to the same shared storage
🏷 Cluster identity detection via configurable value
🧹 Automatic retention-based cleanup of historical exports
📡 Event-driven + periodic sync using Kubernetes watches and optional scheduling

Use Cases

🌐 Multi-cluster DR for shared RWX storage
💾 PV metadata backup and restore
🔁 Migration of PVs between clusters
🧭 Stateless failover for NFS-backed workloads

upcoming release

Features in currently in development for the upcoming release:

Validating admission webhook for a max of one pvsync custom resource per cluster
Advanced Helm chart for prodcution deployments
update cr status with more information:
- pub error_message: Option,
- pub last_run: Option<chrono::DateTimechrono::Utc>,
- pub managed_volumes: Vec,
Optimize current logging implementation (via tracing + tracing-subscriber + EnvFilter)
Implement traces (via tracing + tracing-subscriber + opentelemetry)
Implement metrics (via tikv/prometheus exposed via axum)
instead of an external watcher (s3) on Polling/Listing Comparison, investigate an alternative based on ETAG
Watcher optimizations (namespaces exclusions, pruning fields, Debouncing Repetitions)

Build container

source ../00-ENV/env.sh
CVERSION="v0.6.2"

docker login ghcr.io -u bartvanbenthem -p $CR_PAT

docker build -t pvsync:$CVERSION .

docker tag pvsync:$CVERSION ghcr.io/bartvanbenthem/pvsync:$CVERSION
docker push ghcr.io/bartvanbenthem/pvsync:$CVERSION

# test image
docker run --rm -it --entrypoint /bin/sh pvsync:$CVERSION

/# ls -l /usr/local/bin/pvsync
/# /usr/local/bin/pvsync

Deploy CRD

kubectl apply -f ./config/crd/pvsync.storage.cndev.nl.yaml
# kubectl delete -f ./config/crd/pvsync.storage.cndev.nl.yaml

create secret

# secret containing object storage
source ../00-ENV/env.sh
kubectl -n kube-system create secret generic pvsync \
  --from-literal=OBJECT_STORAGE_ACCOUNT=$OBJECT_STORAGE_ACCOUNT \
  --from-literal=OBJECT_STORAGE_SECRET=$OBJECT_STORAGE_SECRET \
  --from-literal=OBJECT_STORAGE_BUCKET=$OBJECT_STORAGE_BUCKET \
  --from-literal=S3_ENDPOINT_URL=""

Deploy Operator

helm install pvsync ./config/operator/chart --create-namespace --namespace kube-system
kubectl -n kube-system get pods
# helm -n kube-system uninstall pvsync

Sample volume sync resource on protected cluster

# use label: volumesyncs.storage.cndev.nl/sync: "enabled"
# to enable a sync on a persistant volume
kubectl apply -f ./config/samples/pvsync-protected-example.yaml
kubectl describe persistentvolumesyncs.storage.cndev.nl example-protected-cluster
# kubectl delete -f ./config/samples/pvsync-protected-example.yaml

Sample volume sync resource on recovery cluster

kubectl apply -f ./config/samples/pvsync-recovery-example.yaml
kubectl describe persistentvolumesyncs.storage.cndev.nl example-recovery-cluster
# kubectl delete -f ./config/samples/pvsync-recovery-example.yaml

Test Watchers & Reconciler on Create Persistant Volumes

kubectl apply -f ./config/samples/test-pv-nolabel.yaml
kubectl apply -f ./config/samples/test-pv.yaml
# kubectl delete -f ./config/samples/test-pv.yaml
# kubectl delete -f ./config/samples/test-pv-nolabel.yaml

CR Spec

apiVersion: storage.cndev.nl/v1alpha1
kind: PersistentVolumeSync
metadata:
  name: protected-cluster
  labels:
    volumesyncs.storage.cndev.nl/name: protected-cluster
    volumesyncs.storage.cndev.nl/part-of: pvsync-operator
  annotations:
    description: "Disaster Recovery PVSYNC Module"
spec:
  protectedCluster: mycluster
  mode: Protected
  cloudProvider: azure
  retention: 15
---
apiVersion: storage.cndev.nl/v1alpha1
kind: PersistentVolumeSync
metadata:
  name: recovery-cluster
  labels:
    volumesyncs.storage.cndev.nl/name: recovery-cluster
    volumesyncs.storage.cndev.nl/part-of: pvsync-operator
  annotations:
    description: "Disaster Recovery PVSYNC Module"
spec:
  protectedCluster: mycluster 
  mode: Recovery 
  cloudProvider: azure 
  pollingInterval: 25

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
archive		archive
config		config
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
example.env		example.env
pvsync.png		pvsync.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PersistentVolume Sync Operator

Operator Logic: Metadata Synchronization and Storage Mapping

Resource Specification Capture (Sync)

StorageClass Decoupling (Heterogeneous Environment Support)

Recovery Workflow

Supported Storage Architectures

Network Attached Storage (NAS)

Storage Area Network (SAN) & Block Storage

Compatibility Matrix

Implementation Note: RWX vs. RWO

Features

Use Cases

upcoming release

Build container

Deploy CRD

create secret

Deploy Operator

Sample volume sync resource on protected cluster

Sample volume sync resource on recovery cluster

Test Watchers & Reconciler on Create Persistant Volumes

CR Spec

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

bartvanbenthem/pvsync-operator

Folders and files

Latest commit

History

Repository files navigation

PersistentVolume Sync Operator

Operator Logic: Metadata Synchronization and Storage Mapping

Resource Specification Capture (Sync)

StorageClass Decoupling (Heterogeneous Environment Support)

Recovery Workflow

Supported Storage Architectures

Network Attached Storage (NAS)

Storage Area Network (SAN) & Block Storage

Compatibility Matrix

Implementation Note: RWX vs. RWO

Features

Use Cases

upcoming release

Build container

Deploy CRD

create secret

Deploy Operator

Sample volume sync resource on protected cluster

Sample volume sync resource on recovery cluster

Test Watchers & Reconciler on Create Persistant Volumes

CR Spec

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages