Rclone-based replication

Rclone-based replication supports 1:many asynchronous replication of volumes for use cases such as:

  • High fan-out data replication from a central site to many (edge) sites

With this method, VolSync synchronizes data from a ReplicationSource to a ReplicationDestination using Rclone via an intermediary object storage location like AWS S3.


The Rclone method uses a “push” and “pull” model for the data replication. This requires a schedule or other trigger on both the source and destination sides to trigger the replication iterations.

During each synchronization iteration:

  • A point-in-time (PiT) copy of the source volume is created using CSI drivers. This copy will be used as the source data.

  • The copy is attached to an Rclone data mover job pod which uses the contents of the rclone-secret to connect to the intermediary object storage target (e.g., AWS S3).

  • The source pod uses rclone sync to copy the data to S3.

  • On the destination side, a corresponding Rclone mover pod syncs the data from the intermediate object storage into a volume on the destination.

  • At the conclusion of the transfer, the destination creates a snapshot copy to preserve a point-in-time copy of the incoming source data.

VolSync is configured via two CustomResources (CRs), one on the source side and one on the destination side of the replication relationship. While there should only be one ReplicationSource pushing data to the intermediate storage, there may be an arbitrary number of ReplicationDestination instances syncing data from the intermediate storage to destination clusters. This enables the model of high fan-out data distribution.

Source configuration

An example source configuration is shown below:

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
  name: database-source
  namespace: source
spec:
  # The PVC to sync
  sourcePVC: mysql-pv-claim
  trigger:
    # Synchronize every 6 minutes
    schedule: "*/6 * * * *"
  rclone:
    # The configuration section of the rclone config file to use
    rcloneConfigSection: "aws-s3-bucket"
    # The path to the object bucket
    rcloneDestPath: "volsync-test-bucket/mysql-pv-claim"
    # Secret holding the rclone configuration
    rcloneConfig: "rclone-secret"
    # Method used to generate the PiT copy
    copyMethod: Snapshot
    # The StorageClass to use when creating the PiT copy (same as source PVC if omitted)
    storageClassName: my-sc-name
    # The VSC to use if the copy method is Snapshot (default if omitted)
    volumeSnapshotClassName: my-vsc-name

Since the copyMethod specified above is Snapshot, the Rclone data mover creates a VolumeSnapshot of the source pvc mysql-pv-claim. Then it converts this snapshot back into a PVC. If copyMethod: Clone were used, the temporary, point-in-time copy would be created by cloning the source PVC to a new PVC directly. This is more efficient, but it is not supported by all CSI drivers.

The synchronization schedule, .spec.trigger.schedule, is defined by a cronspec, making the schedule very flexible. Both intervals (shown above) as well as specific times and/or days can be specified.

Source status

Once the ReplicationSource is deployed, VolSync updates the nextSyncTime in the ReplicationSource object.

---
apiVersion:  volsync.backube/v1alpha1
kind:         ReplicationSource
#  ... omitted ...
spec:
  rclone:
    copyMethod:               Snapshot
    rcloneConfig:             rclone-secret
    rcloneConfigSection:      aws-s3-bucket
    rcloneDestPath:           volsync-test-bucket/mysql-pv-claim
    storageClassName:         my-sc-name
    volumeSnapshotClassName:  my-vsc-name
  sourcePVC:              mysql-pv-claim
  trigger:
    schedule:  "*/6 * * * *"
  status:
    conditions:
      lastTransitionTime:  2021-01-18T21:50:59Z
      message:               Reconcile complete
      reason:                ReconcileComplete
      status:                True
      type:                  Reconciled
    nextSyncTime:          2021-01-18T22:00:00Z

Additional source options

There are a number of more advanced configuration parameters that are supported for configuring the source. All of the following options would be placed within the .spec.rclone portion of the ReplicationSource CustomResource.

accessModes

When using a copyMethod of Clone or Snapshot, this field allows overriding the access modes for the point-in-time (PiT) volume. The default is to use the access modes from the source PVC.

capacity

When using a copyMethod of Clone or Snapshot, this allows overriding the capacity of the PiT volume. The default is to use the capacity of the source volume.

copyMethod

This specifies the method used to create a PiT copy of the source volume. Valid values are:

  • Clone - Create a new volume by cloning the source PVC (i.e., use the source PVC as the volumeSource for the new volume.

  • Direct - Do no create a PiT copy. The VolSync data mover will directly use the source PVC.

  • Snapshot - Create a VolumeSnapshot of the source PVC, then use that snapshot to create the new volume. This option should be used for CSI drivers that support snapshots but not cloning.

storageClassName

This specifies the name of the StorageClass to use when creating the PiT volume. The default is to use the same StorageClass as the source volume.

volumeSnapshotClassName

When using a copyMethod of Snapshot, this specifies the name of the VolumeSnapshotClass to use. If not specified, the cluster default will be used.

rcloneConfigSection

This is used to identify the configuration section within rclone.conf to use.

rcloneDestPath

This is the remote storage location in which the persistent data will be uploaded.

Normally the root of this path is the storage bucket name. Any sub paths would be created as folders in the storage bucket.

In the example above, using volsync-test-bucket/mysql-pv-claim means that the source pvc will be replicated to the folder called mysql-pv-claim in the bucket called volsync-test-bucket.

If a unique bucket is used for each PVC to be replicated, then a path with simply the bucket name (such as volsync-test-bucket) is sufficient. However if the same bucket will be used for multiple different PVCs (and therefore multiple ReplicationSources), a unique path should be used for each PVC/ReplicationSource.

rcloneConfig

This specifies the name of a secret to be used to retrieve the Rclone configuration. The content of the Secret is an rclone.conf file.

customCA

This option allows a custom certificate authority to be used when making TLS (https) connections to the remote repository.


Destination configuration

An example destination configuration is shown here:

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationDestination
metadata:
  name: database-destination
  namespace: dest
spec:
  trigger:
    # Every 6 minutes, offset by 3 minutes
    schedule: "3,9,15,21,27,33,39,45,51,57 * * * *"
  rclone:
    rcloneConfigSection: "aws-s3-bucket"
    rcloneDestPath: "volsync-test-bucket/mysql-pvc-claim"
    rcloneConfig: "rclone-secret"
    copyMethod: Snapshot
    accessModes: [ReadWriteOnce]
    capacity: 10Gi
    storageClassName: my-sc
    volumeSnapshotClassName: my-vsc

Similar to the replication source, a synchronization schedule is defined .spec.trigger.schedule. This indicates when persistent data should be pulled from the remote storage location. It is important that the schedule for the destinations are offset from that of the source to allow the source to finish pushing updates for an iteration prior to the the destination attempting to pull them.

In the above example, a 10 GiB RWO volume will be provisioned using the my-sc StorageClass to serve as the destination for replicated data. This volume is used by the Rclone data mover to receive the incoming data transfers.

Since the copyMethod specified above is Snapshot, a VolumeSnapshot of the incoming data will be created at the end of each synchronization interval. It is this snapshot that would be used to gain access to the replicated data. The name of the current VolumeSnapshot holding the latest synced data will be placed in .status.latestImage.

Destination status

VolSync provides status information on the state of the replication via the .status field in the ReplicationDestination object:

---
API Version:  volsync.backube/v1alpha1
Kind:         ReplicationDestination
#  ... omitted ...
Spec:
  Rclone:
    Access Modes:
      ReadWriteOnce
    Capacity:                    10Gi
    Copy Method:                 Snapshot
    Rclone Config:               rclone-secret
    Rclone Config Section:       aws-s3-bucket
    Rclone Dest Path:            volsync-test-bucket
    Storage Class Name:          my-sc
    Volume Snapshot Class Name:  my-vsc
  Status:
    Conditions:
      Last Transition Time:  2021-01-19T22:16:02Z
      Message:               Reconcile complete
      Reason:                ReconcileComplete
      Status:                True
      Type:                  Reconciled
    Last Sync Duration:      7.066022293s
    Last Sync Time:          2021-01-19T22:16:02Z
    Latest Image:
      API Group:  snapshot.storage.k8s.io
      Kind:       VolumeSnapshot
      Name:       volsync-dest-database-destination-20210119221601

In the above example,

  • Rclone Dest Path indicates the intermediary storage system from where data will be transferred to the destination site. In the above example, the intermediary storage system is an S3 bucket.

  • No errors were detected (the Reconciled condition is True).

After at least one synchronization has taken place, the following will also be available:

  • Last Sync Time contains the time of the last successful data synchronization.

  • Latest Image references the object with the most recent copy of the data. If the copyMethod is Snapshot, this will be a VolumeSnapshot object. If the copyMethod is Direct, this will be the PVC that is used as the destination by VolSync.

Additional destination options

There are a number of more advanced configuration parameters that are supported for configuring the destination. All of the following options would be placed within the .spec.rclone portion of the ReplicationDestination CustomResource.

accessModes

When VolSync creates the destination volume, this specifies the accessModes for the PVC. The value should be ReadWriteOnce or ReadWriteMany.

capacity

When VolSync creates the destination volume, this value is used to determine its size. This need not match the size of the source volume, but it must be large enough to hold the incoming data.

copyMethod

This specifies how the data should be preserved at the end of each synchronization iteration. Valid values are:

  • Direct - Do not create a point-in-time copy of the data.

  • Snapshot - Create a VolumeSnapshot at the end of each iteration

destinationPVC

Instead of having VolSync automatically provision the destination volume (using capacity, accessModes, etc.), the name of a pre-existing PVC may be specified here.

storageClassName

When VolSync creates the destination volume, this specifies the name of the StorageClass to use. If omitted, the system default StorageClass will be used.

volumeSnapshotClassName

When using a copyMethod of Snapshot, this value specifies the name of the VolumeSnapshotClass to use when creating a snapshot. If omitted, the system default VolumeSnapshotClass will be used.

rcloneConfigSection

This is used to identify the configuration section within rclone.conf to use.

rcloneDestPath

This is the remote storage location in which the persistent data will be downloaded. This should match the rcloneDestPath used on the ReplicationSource.

rcloneConfig

This specifies the secret to be used. The secret contains an rclone.conf file with the configuration and credentials for the object target.

customCA

This option allows a custom certificate authority to be used when making TLS (https) connections to the remote repository.

For a concrete example, see the database synchronization example.

Using a custom certificate authority

Normally, Rclone will use a default set of certificates to verify the validity of remote repositories when making https connections. However, users that deploy with a self-signed certificate will need to provide their CA’s certificate via the customCA option.

The custom CA certificate needs to be provided in a Secret or ConfigMap to VolSync. For example, if the CA certificate is a file in the current directory named ca.crt, it can be loaded as a Secret or a ConfigMap.

Example using a customCA loaded as a secret:

$ kubectl create secret generic tls-secret --from-file=ca.crt=./ca.crt
secret/tls-secret created

$ kubectl describe secret/tls-secret
Name:         tls-secret
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
ca.crt:  1127 bytes

This Secret would then be used in the ReplicationSource and/or ReplicationDestination objects:

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
  name: mydata-backup-with-customca
spec:
  # ... fields omitted ...
  rclone:
    # ... other fields omitted ...
    customCA:
      secretName: tls-secret
      key: ca.crt

To use a customCA in a ConfigMap, specify configMapName in the spec instead of secretName, for example:

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
  name: mydata-backup-with-customca
spec:
  # ... fields omitted ...
  rclone:
    # ... other fields omitted ...
    customCA:
      configMapName: tls-configmap-name
      key: ca.crt