Rsync-TLS-based replication

Rsync-based replication supports 1:1 asynchronous replication of volumes for use cases such as:

Disaster recovery
Mirroring to a test environment
Sending data to a remote site for processing

With this method, VolSync synchronizes data from a ReplicationSource to a ReplicationDestination using Rsync across a TLS-protected tunnel, provided by stunnel. By using Rsync, the amount of data transferred during each synchronization is kept to a minimum, and the TLS connection ensures that the data transfer is both authenticated and secure.

Rsync-over-TLS uses a “push” model for the data replication. A schedule or other trigger is used on the source side of the relationship to trigger each replication iteration. The destination continuously waits for incoming data.

During each iteration, (optionally) a point-in-time (PiT) copy of the source volume is created and used as the source data. The VolSync Rsync data mover then connects to the destination using stunnel (exposed via a Service) and sends any updates. At the conclusion of the transfer, the destination (optionally) creates a VolumeSnapshot to preserve the updated data.

VolSync is configured via two CustomResources (CRs), one on the source side and one on the destination side of the replication relationship.

Destination configuration 

Start by configuring the destination; an example is shown below:

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationDestination
metadata:
  name: my-dest
  namespace: myns
spec:
  rsyncTLS:
    copyMethod: Snapshot
    capacity: 10Gi
    accessModes: ["ReadWriteOnce"]
    storageClassName: my-sc
    volumeSnapshotClassName: my-vsc

In the above example, a 10 GiB RWO volume will be provisioned using the StorageClass my-sc to serve as the destination for replicated data. This volume is used by the rsync data mover to receive the incoming data transfers.

Since the copyMethod specified above is Snapshot, a VolumeSnapshot will be created, using the VolumeSnapshotClass named my-vsc, at the end of each synchronization interval. It is this snapshot that would be used to gain access to the replicated data. The name of the current VolumeSnapshot holding the latest synced data will be placed in the ReplicationDestination’s .status.latestImage.

Destination status 

VolSync provides status information on the state of the replication via the .status field in the ReplicationDestination object:

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationDestination
metadata:
  name: my-dest
  namespace: myns
spec:
  rsyncTLS:
  # ... omitted ...
status:
status:
  conditions:
    - lastTransitionTime: "2022-11-29T13:27:54Z"
      message: Synchronization in-progress
      reason: SyncInProgress
      status: "True"
      type: Synchronizing
  lastSyncStartTime: "2022-11-29T13:27:54Z"
  rsyncTLS:
    address: 10.96.231.114
    keySecret: volsync-rsync-tls-my-dest

In the above example,

The destination is waiting for data (The Synchronizing condition is True)
The destination TLS endpointis available at the IP specified in .status.rsyncTLS.address. This should be used when configuring the corresponding ReplicationSource.
The TLS key is available in the Secret .status.rsyncTLS.keySecret. This Secret will need to be copied to the source so that it can authenticate.

After at least one synchronization has taken place, the following will also be available:

lastSyncTime contains the time of the last successful data synchronization.
latestImage references the object with the most recent copy of the data. If the copyMethod is Snapshot, this will be a VolumeSnapshot object. If the copyMethod is Direct, this will be the PVC that is used as the destination by VolSync.

Additional destination options 

There are a number of more advanced configuration parameters that are supported for configuring the destination. All of the following options would be placed within the .spec.rsyncTLS portion of the ReplicationDestination CustomResource.

accessModes

When VolSync creates the destination volume, this specifies the accessModes for the PVC. The value should be ReadWriteOnce or ReadWriteMany.

capacity

When VolSync creates the destination volume, this value is used to determine its size. This need not match the size of the source volume, but it must be large enough to hold the incoming data.

copyMethod

This specifies how the data should be preserved at the end of each synchronization iteration. Valid values are:

Direct - Do not create a point-in-time copy of the data.
Snapshot - Create a VolumeSnapshot at the end of each iteration

destinationPVC

Instead of having VolSync automatically provision the destination volume (using capacity, accessModes, etc.), the name of a pre-existing PVC may be specified here.

storageClassName

When VolSync creates the destination volume, this specifies the name of the StorageClass to use. If omitted, the system default StorageClass will be used.

volumeSnapshotClassName

When using a copyMethod of Snapshot, this value specifies the name of the VolumeSnapshotClass to use when creating a snapshot. If omitted, the system default VolumeSnapshotClass will be used.

keySecret: This is the name of a Secret that contains the TLS-PSK key for authenticating the connection with the source. If not provided, the key will be automatically generated and placed in .status.rsyncTLS.keySecret.
moverSecurityContext: This field allows specifying the PodSecurityContext that will be used by the data mover. It can be used to customize the user, fsGroup, etc.
serviceType: VolSync creates a Service to allow the source to connect to the destination. This field determines the type of that Service. Allowed values are ClusterIP or LoadBalancer. The default is ClusterIP.

Source configuration 

An example source configuration is shown here:

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
  name: my-source
  namespace: source
spec:
  sourcePVC: mysql-pv-claim
  trigger:
    schedule: "*/5 * * * *"
  rsyncTLS:
    keySecret: volsync-rsync-dest-src-database-destination
    address: my.host.com
    copyMethod: Clone

In the above example, the PVC named mysql-pv-claim will be replicated every 5 minutes using the rsync-TLS replication method. At the start of each iteration, a clone of the source PVC will be created to generate a point-in-time copy for the iteration. The source will then use the TLS key in the named Secret (.spec.rsyncTLS.keySecret) to authenticate to the destination. The connection will be made to the address specified in .spec.rsyncTLS.address.

The synchronization schedule, .spec.trigger.schedule, is defined by a cronspec, making the schedule very flexible. Both intervals (shown above) as well as specific times and/or days can be specified.

When configuring the source, the user must manually create the Secret referenced in .spec.rsyncTLS.keySecret by copying the contents from the Secret generated previously on the destination.

Additionally, this ReplicationSource specifies a copyMethod of Clone which will directly generate a point-in-time copy of the source volume. However, not all CSI drivers support volume cloning (most notably the ebs-csi driver). In such cases, the copyMethod: Snapshot can be used to indirectly create a copy of the volume by first taking a snapshot, then restoring it. In this case, the user should also provide the volumeSnapshotClassName: <vsc-name> option to indicate which VolumeSnapshotClass VolSync should use when creating the temporary snapshot.

Source status 

The state of the replication from the source’s point of view is available in the .status field of the ReplicationSource:

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
  name: mySource
  namespace: source
spec:
  sourcePVC: mysql-pv-claim
  trigger:
    schedule: "*/5 * * * *"
  rsync:
    # ... omitted ...
status:
status:
  conditions:
    - lastTransitionTime: "2022-11-29T17:25:13Z"
      message: Waiting for next scheduled synchronization
      reason: WaitingForSchedule
      status: "False"
      type: Synchronizing
  lastSyncDuration: 28.818695981s
  lastSyncTime: "2022-11-29T17:25:28Z"
  nextSyncTime: "2022-11-29T17:30:00Z"
  rsyncTLS: {}

In the above example,

The last synchronization was completed at .status.lastSyncTime and took .status.lastSyncDuration seconds.
The next scheduled synchronization is at .status.nextSyncTime.

Note

The length of time required to synchronize the data is determined by the rate of change for data in the volume and the bandwidth between the source and destination. In order to avoid missed intervals, ensure there is sufficient bandwidth between the source and destination such that lastSyncDuration remains safely below the synchronization interval (.spec.trigger.schedule).

Additional source options 

There are a number of more advanced configuration parameters that are supported for configuring the source. All of the following options would be placed within the .spec.rsync portion of the ReplicationSource CustomResource.

accessModes

When using a copyMethod of Clone or Snapshot, this field allows overriding the access modes for the point-in-time (PiT) volume. The default is to use the access modes from the source PVC.

capacity

When using a copyMethod of Clone or Snapshot, this allows overriding the capacity of the PiT volume. The default is to use the capacity of the source volume.

copyMethod

This specifies the method used to create a PiT copy of the source volume. Valid values are:

Clone - Create a new volume by cloning the source PVC (i.e., use the source PVC as the volumeSource for the new volume.
Direct - Do no create a PiT copy. The VolSync data mover will directly use the source PVC.
Snapshot - Create a VolumeSnapshot of the source PVC, then use that snapshot to create the new volume. This option should be used for CSI drivers that support snapshots but not cloning.

storageClassName

This specifies the name of the StorageClass to use when creating the PiT volume. The default is to use the same StorageClass as the source volume.

volumeSnapshotClassName

When using a copyMethod of Snapshot, this specifies the name of the VolumeSnapshotClass to use. If not specified, the cluster default will be used.

address: This specifies the address of the replication destination’s ssh server. It can be taken directly from the ReplicationDestination’s .status.rsync.address field.
keySecret: This is the name of a Secret that contains the TLS-PSK key for authenticating the connection with the source. If not provided, the key will be automatically generated and placed in .status.rsyncTLS.keySecret.
moverSecurityContext: This field allows specifying the PodSecurityContext that will be used by the data mover. It can be used to customize the user, fsGroup, etc.

Rsync-specific considerations 

This section explains some additional considerations when setting up rsync-TLS-based replication.

TLS authentication 

The TLS connection provided by stunnel is secured using TLS-PSK. This means that the ReplicationSource and ReplicationDestination need to have access to a shared key. The keySecret field in the CustomResources determine the location of the key. If the name of a Secret in not provided in .spec.rsyncTLS.keySecret, it will be automatically generated and the name of the Secret placed into the .status.rsyncTLS.keySecret.

This optional generation means that the key can either be automatically generated, then copied to the other side or it can be pre-generated and supplied to both sides when the replication is configured. The pre-generation approach would be more suitable for gitops-type workflows.

The Secret itself contains a single field, named psk.txt. This field follows the format expected by stunnel:

<id>:<at least 32 hex digits>

For example:

1:23b7395fafc3e842bd8ac0fe142e6ad1

The corresponding Secret would be:

Example secret.yaml

apiVersion: v1
data:
  # echo -n 1:23b7395fafc3e842bd8ac0fe142e6ad1 | base64
  psk.txt: MToyM2I3Mzk1ZmFmYzNlODQyYmQ4YWMwZmUxNDJlNmFkMQ==
kind: Secret
metadata:
  name: tls-key-secret
type: Opaque

Rsync-TLS mover permissions 

Due to limitations of rsync, when run in the normal, unprivileged mode, the data mover Pod must run with a non-zero UID. This may require specifying a Pod Security Context in the ReplicationSource and ReplicationDestination objects to explicitly set the UID for the mover. Please see the documentation on the mover permission model for more details.

Choosing between Service types (ClusterIP vs LoadBalancer)

When using Rsync-TLS-based replication, the ReplicationSource needs to be able to make a network connection to the ReplicationDestination. This requires network connectivity from the source to the destination cluster.

When a ReplicationDestination object is created, VolSync creates a corresponding Service object to serve as the network endpoint. The type of Service (LoadBalancer or ClusterIP) should be specified in the ReplicationDestination’s .spec.rsync.serviceType field.

ReplicationDestination with service type highlighted

apiVersion: volsync.backube/v1alpha1
kind: ReplicationDestination
metadata:
  name: database-destination
  namespace: dest
spec:
  rsyncTLS:
    accessModes:
    - ReadWriteOnce
    capacity: 2Gi
    copyMethod: Snapshot
    serviceType: ClusterIP

The clusters’ networking configuration between the two clusters affects the proper choice of Service type.

If ClusterIP is specified, the Service will receive an IP address allocated from the “cluster network” address pool. By default, this collection of addresses are not accessible from outside the cluster, making it a poor choice for cross-cluster replication. However, various networking addons such as Submariner bridge the cluster networks, making this a good option.

If LoadBalancer is specified, an externally accessible IP address will be allocated. This requires cluster support for load balancers such as those provided by the various cloud providers or MetalLB in the case of physical clusters. While this is the easiest method for allocating an accessible address in cloud environments, load balancers tend to incur additional costs and be limited in number.

To summarize the above trade-offs, when running on one of the public clouds, using a LoadBalancer is a quick way to get started and will work for replicating small numbers of volumes. If replicating a large number of volumes, an overlay network solution such as Submariner in combination with ClusterIP addresses will likely be more scalable.