Question

Will Kasten use Shallow Read-Only CephFS snapshots?


Userlevel 2

Hi all,

With release 3.7.0, Ceph-CSI has finally introduced shallow read-only CephFS snapshots which are pretty much zero-cost compared to the "old" snapshot approach which required a full copy of all data on the values in order to mount the snapshot for reading. https://github.com/ceph/ceph-csi/blob/devel/docs/design/proposals/cephfs-snapshot-shallow-ro-vol.md

CephFS backups onto S3 with Kasten are almost impossible for me currently as the volume generation from the snapshot takes much longer than the configured timeout limits for even a medium amount of data (50GB) and uses a lot of resources.

I would except shallow read-only snapshots to solve this, but it seems that Kasten still uses the old way. From what I gathered from the documentation, as long as the snapshot volume is mounted read-only, ceph-csi should use shallow snapshots.

Is my configuration incorrect or has Kasten not yet implemented this?

 

Thank you very much,

Pascal


20 comments

Userlevel 6
Badge +2

@pascalzero Thank you for posting this question.

As you mentioned, CephFS takes a lot of time to be restore from a snapshot and we have seen this happening a lot causing failures to export.

We could help you in tweaking the timeout for the wait period in this case.

However, I will try to go through the ceph-csi docs for 3.7 release and this shallow read-only volumes.

Do you know if this feature require any changes in accessModes for the PVC that is created with the volumesnapshot as datasource ?

Currently K10 uses the accessMode of the original PVC for the temporary PVCs created during exports.
If there is a change in the Spec that is required to utilise the shallow readonly clone, then we will have to file a feature request. 

Userlevel 2

Hi @jaiganeshjk , thanks for the quick response.

I believe this can be a real game changer for CephFS backups then. We would not want to change timeouts for now as this also puts quite a bit of load on the system, so a proper solution is definitely required.

We have reverted to other software/scripts for the time being, but it would be ideal of course if Kasten handled this nicely.

As I understand the spec, if you just use readonly access for the temporary PVC, that should already result in shallow clone being used, as per this item from the design doc:

  • Volume source is a snapshot, volume access mode is *_READER_ONLY.

I'd be really happy to see this happen any time soon - if there is a way for me to support (e.g. through testing), please let me know.

 

Cheers,

Pascal

Userlevel 6
Badge +2

Thanks Pascal.
Reading about this feature leads me to believe that we can get it working with K10 out of the box.

I will do further reading and some tests to get this working and keep you posted on the same.

Userlevel 2

You got me excited :-) Good luck!

Userlevel 6
Badge +2

@pascalzero I was going through through the testing and found that it cannot work with K10 out of the box.

Unfortunately, this feature needs the PVCs to be created with the accessMode set to `ROX`(which is the only supported accessMode for snapshot-backed volumes).

However, K10 uses the accessMode for the temporary PVC from the original PVCs manifest.

We don’t have a way to override this as of today.

I will be opening a feature request to support this and keep you informed once we have this supported in the product.

Userlevel 2

Hi @jaiganeshjk ,

Thanks a lot for the investigation and getting back on this.

That’s somewhat what I expected, but crossing my fingers now that it might land soon as it’s surely a vital feature for a lot of users once Ceph CSI 3.7 adoption has spread a bit further.

Again if there’s anything I can do to help testing, let me know.

Userlevel 2

Hi @jaiganeshjk ,

 

Just wanted to check in if you have any insights into the release planning and whether this has can be plotted on the timeline, yet?

We are still struggling with the issue every night when backups are running as storage load goes up so bad due to the CephFS copying that it impacts the overall system stability.

 

Thanks a lot!

Userlevel 2

Quick update. With the release of ceph-csi 3.8 today the new shallow snapshots will be used by default as long as the access mode is ROX. Kasten seems to use RWO however.

I have checked the Kanister source code, but the snapshot mounting for s3 upload seems to happen in the closed Kasten code base?

Userlevel 3

Would be great to have some work looked into this 

Userlevel 6
Badge +2

Thank you for your interest. We are tracking this internally.

However, we don’t have any timelines yet.

@pascalzero  You are right. As I mentioned earlier, We reuse the accessMode of the PVC from the original PVC that is being exported.

We would update you once we have the ability in the product. 

Userlevel 3

Thank you for your interest. We are tracking this internally.

However, we don’t have any timelines yet.

@pascalzero  You are right. As I mentioned earlier, We reuse the accessMode of the PVC from the original PVC that is being exported.

We would update you once we have the ability in the product. 

Was hoping after 4 months I’d hear something else about this issue.

 

Just to give you an idea of how this affects me, I cannot backup CephFS volumes with K10, because I have hundreds and hundreds of GB’s that must be copied every time K10 makes a backup with a snapshot, not only is it crazy IO expensive, but it times out and the jobs fail because of x3 retires (that I can’t increase).

For the most part RSYNC works (for my other workloads that use CephFS), but I am deploying Bitbucket and it doesn’t reserve file permissions, so I must find another way to backup (if the file permissions aren’t retained it breaks my backup), so I can’t use RSYNC for backing up this application, and I can’t use K10… :( 

Hi,

 

we are facing same issue, while backup jobs start it takes too long to create clone and it fills cepfs pool.

 

so is there any solution for this issue?

 

or is there any road map to solve this problem?

Userlevel 6
Badge +2

We are already working on supporting shallow clone for cephFS. I don't have a definite timeline for this.

But you can expect it soon.

@voarsh @Laksoy The only workaround that I have for now is to increase the timeout that K10 waits for the Pod to be ready so that it doesn’t timeout within the clone creation time.

Currently, the timeout is set for 15 mins and you can increase it to a value by looking at how much time it takes for your largest volume to get cloned.

This way, you can ensure that the backups are there and there are no stale clones left in the ceph Filesystem. 

You will need to use the helm value --set kanister.podReadyWaitTimeout=<timeout_value_in_minutes> to upgrade your k10 release. 

 

I know that this is just a temporary workaround to make it work with cephFS . I will be able to update this thread once we have support for the shallow clone in CephFS.

Userlevel 3

Glad that you’re at least looking into the issue. I will investigate your workaround for now.

 

There was another question I had about CephFS clones as-is, it creates a linked clone within the CephFS filesystem, when it is deleted either by retention policy or manually, does it actually delete the snapshot within CephFS? Trying to clear up old CephFS snapshots for volumes within the Ceph Admin UI is atrociously painful and slow by hand, and the Ceph tooling around managing CephFS snapshots is sorely lacking - that’s another reason I don’t like playing with CephFS snapshots…….

Ceph RBD snapshots are a little more transparent about this, and know that Kasten K10 definitely deletes snapshots (although I’ve mentioned numerous times about orphaned images, etc that don’t get deleted and hang around).

Userlevel 6
Badge +2

Hi all,

Just to keep you all informed that much awaited support for shallow read-only volume snapshots for cephFS during export operations is available in k10 from version 6.5.2

Here's the documentation for the same.

https://docs.kasten.io/latest/install/storage.html#snapshots-as-shallow-read-only-volumes-cephfs-only

Userlevel 3

Hi all,

Just to keep you all informed that much awaited support for shallow read-only volume snapshots for cephFS during export operations is available in k10 from version 6.5.2

Here's the documentation for the same.

https://docs.kasten.io/latest/install/storage.html#snapshots-as-shallow-read-only-volumes-cephfs-only

Finally - I saw this just today by chance - hadn’t updated K10 since August 2023.

 

The documentation isn’t really clear enough for me on setting up CephFS shallow volume clones.

I am using external snapshotter v4 - I am looking to upgrade to v6 shortly, not sure if that’s a prerequisite….

 

Here’s my VolumeSnapshotClass that K10 created:

 

 

apiVersion: snapshot.storage.k8s.io/v1
deletionPolicy: Retain
driver: rook-ceph.cephfs.csi.ceph.com
kind: VolumeSnapshotClass
metadata:
annotations:
meta.helm.sh/release-name: rook-ceph-cluster
meta.helm.sh/release-namespace: rook-ceph
labels:
kanister-cloned-from: ceph-filesystem
name: k10-clone-ceph-filesystem
parameters:
clusterID: rook-ceph
csi.storage.k8s.io/snapshotter-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/snapshotter-secret-namespace: rook-ceph

 

Here’s the CephFS storageclass:

 

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
meta.helm.sh/release-name: rook-ceph-cluster
meta.helm.sh/release-namespace: rook-ceph
storageclass.kubernetes.io/is-default-class: "false"
labels:
app.kubernetes.io/managed-by: Helm
name: ceph-filesystem
parameters:
clusterID: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
fsName: ceph-filesystem
pool: ceph-filesystem-data0
provisioner: rook-ceph.cephfs.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

 

The docs at https://docs.kasten.io/latest/install/storage.html#snapshots-as-shallow-read-only-volumes-cephfs-only don’t really explain if I should add “backingSnapshot: "true"” as an annotation to storageclass or VolumeSnapshotClass…. or what this actually is? 

exportData:
  enabled: true
  overrides:
    - storageClassName: regular-cephfs-csi-storage-class
      enabled: true
      exporterStorageClassName: shallow-cephfs-csi-storage-class

 

Can someone explain if I need to edit the storageclass/VolumeSnapshotClass with “backingSnapshot: "true" and what I’m doing with exporterStorageClassName?

 

----- EDIT:

A further quick look, seems i need to edit the backup plan policy to include these overrides:

exportData:
          enabled: true
    - storageClassName: regular-cephfs-csi-storage-class
      enabled: true
      exporterStorageClassName: shallow-cephfs-csi-storage-class

 

Clone my cephfs storageclass and have a parameter with backingSnapshot: "true"

 

….

On the right track?

 

With the cloned storageclass, extra parameter with backingSnapshot “true”, and the policy edit:

 

The PVC is never cloned/provisioned…. :/

 

 

Userlevel 6
Badge +2

@voarsh Thanks for your comment.

Using this feature requires a special StorageClass, which is usually a copy of the regular StorageClass of the CephFS CSI driver, but with the backingSnapshot: "true" option in the parameters section

You will need to create a new storageClass with the parameter backingSnapshot set to true. Here’s the example from the CephFS github that shows how to add `backingSnapshot` parameter.

https://github.com/ceph/ceph-csi/blob/devel/examples/cephfs/storageclass.yaml

 

In order to use shallow copy, PVCs that you create with the volumesnapshot as a dataSource needs to use this storageClass.

 

In case of K10 exports, There is a way to override which storageClass to use when doing a clone, that’s where the exporterStorageClassName comes in to picture. This override resides in the policy CR of k10.

You will have to specify the storageClass name that has backingSnapshot set to true in place of exporterStorageClassName.

 

Let me know if it makes sense.

Userlevel 6
Badge +2

Basically, You don’t change the existing storageClass that you use(as restores from local volumesnapshot will fail when you do that).

Instead, create a new storageClass with backingSnapshot parameter set to true and use the override in the policy. 

 

Below note is important as well as you will need to preserve SELinux options while using CephFS shallow volume copy for export. This annotation should be added to the original storageClass.
 

Additionally, in the case of SELinux usage, it may be necessary to preserve SELinuxOptions of the original Pod into the Kanister Pod during the Export phase.

$ kubectl annotate storageclass regular-cephfs-csi-storage-class \    k10.kasten.io/sc-preserve-selinux-options="true"

 

Userlevel 3

Basically, You don’t change the existing storageClass that you use(as restores from local volumesnapshot will fail when you do that).

Instead, create a new storageClass with backingSnapshot parameter set to true and use the override in the policy. 

 

Below note is important as well as you will need to preserve SELinux options while using CephFS shallow volume copy for export. This annotation should be added to the original storageClass.
 

Additionally, in the case of SELinux usage, it may be necessary to preserve SELinuxOptions of the original Pod into the Kanister Pod during the Export phase.

$ kubectl annotate storageclass regular-cephfs-csi-storage-class \    k10.kasten.io/sc-preserve-selinux-options="true"

 

 

Thanks for the quick reply.

I’ve annotated the original storageclass:

kubectl annotate storageclass ceph-filesystem k10.kasten.io/sc-preserve-selinux-options="true"

Cloned it, with new params backingSnapshot: "true"

Edited a backup policy to include:

        exportData:
          enabled: true
          overrides:
            - storageClassName: ceph-filesystem
              enabled: true
              exporterStorageClassName: shallow-cephfs-csi-storage-class

 

 

Getting all sorts of errors from the CSI CephFS provisioner pod :/

 

I0207 08:53:00.449476 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kasten-io", Name:"kanister-pvc-pcbc7", UID:"7e8cbcdf-4f4c-40a9-bde7-c19dd6b98a5b", APIVersion:"v1", ResourceVersion:"427874421", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "kasten-io/kanister-pvc-pcbc7"

W0207 08:53:01.656594 1 controller.go:1165] requested volume size 214748364800 is greater than the size 0 for the source snapshot snapshot-copy-24qlrtcw. Volume plugin needs to handle volume expansion.

W0207 08:53:01.656699 1 controller.go:1165] requested volume size 536870912000 is greater than the size 0 for the source snapshot snapshot-copy-pdz9wv9p. Volume plugin needs to handle volume expansion.

W0207 08:53:01.658867 1 controller.go:1165] requested volume size 107374182400 is greater than the size 0 for the source snapshot snapshot-copy-4mdkmbsv. Volume plugin needs to handle volume expansion.

W0207 08:53:04.023600 1 controller.go:934] Retrying syncing claim "7e8cbcdf-4f4c-40a9-bde7-c19dd6b98a5b", failure 14

E0207 08:53:04.023844 1 controller.go:957] error syncing claim "7e8cbcdf-4f4c-40a9-bde7-c19dd6b98a5b": failed to provision volume with StorageClass "ceph-filesystem": rpc error: code = Aborted desc = clone from snapshot is pending

I0207 08:53:04.024141 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kasten-io", Name:"kanister-pvc-pcbc7", UID:"7e8cbcdf-4f4c-40a9-bde7-c19dd6b98a5b", APIVersion:"v1", ResourceVersion:"427874421", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ceph-filesystem": rpc error: code = Aborted desc = clone from snapshot is pending

W0207 08:53:04.203382 1 controller.go:934] Retrying syncing claim "1e8a503b-61d1-4909-9c9a-f62465d5ab9a", failure 14

E0207 08:53:04.203464 1 controller.go:957] error syncing claim "1e8a503b-61d1-4909-9c9a-f62465d5ab9a": failed to provision volume with StorageClass "ceph-filesystem": rpc error: code = Aborted desc = clone from snapshot is pending

I0207 08:53:04.203587 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kasten-io", Name:"kanister-pvc-fjq6t", UID:"1e8a503b-61d1-4909-9c9a-f62465d5ab9a", APIVersion:"v1", ResourceVersion:"427874441", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ceph-filesystem": rpc error: code = Aborted desc = clone from snapshot is pending

W0207 08:53:04.239886 1 controller.go:934] Retrying syncing claim "2c095b47-e83a-451d-8be1-4466f05f6f18", failure 14

E0207 08:53:04.239957 1 controller.go:957] error syncing claim "2c095b47-e83a-451d-8be1-4466f05f6f18": failed to provision volume with StorageClass "ceph-filesystem": rpc error: code = Aborted desc = clone from snapshot is pending

I0207 08:53:04.239998 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kasten-io", Name:"kanister-pvc-ltmbn", UID:"2c095b47-e83a-451d-8be1-4466f05f6f18", APIVersion:"v1", ResourceVersion:"427874431", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ceph-filesystem": rpc error: code = Aborted desc = clone from snapshot is pending

I0207 08:53:31.899871 1 controller.go:1359] provision "kasten-io/kanister-pvc-tsxbr" class "ceph-filesystem": started

I0207 08:53:31.901090 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kasten-io", Name:"kanister-pvc-tsxbr", UID:"fa273554-fdc9-4cc8-9aaf-2f8ab960cf64", APIVersion:"v1", ResourceVersion:"427807277", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "kasten-io/kanister-pvc-tsxbr"

W0207 08:53:31.996143 1 controller.go:934] Retrying syncing claim "fa273554-fdc9-4cc8-9aaf-2f8ab960cf64", failure 29

E0207 08:53:31.996223 1 controller.go:957] error syncing claim "fa273554-fdc9-4cc8-9aaf-2f8ab960cf64": failed to provision volume with StorageClass "ceph-filesystem": error getting handle for DataSource Type VolumeSnapshot by Name snapshot-copy-4hznvvz6: error getting snapshot snapshot-copy-4hznvvz6 from api server: volumesnapshots.snapshot.storage.k8s.io "snapshot-copy-4hznvvz6" not found

I0207 08:53:31.996265 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kasten-io", Name:"kanister-pvc-tsxbr", UID:"fa273554-fdc9-4cc8-9aaf-2f8ab960cf64", APIVersion:"v1", ResourceVersion:"427807277", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ceph-filesystem": error getting handle for DataSource Type VolumeSnapshot by Name snapshot-copy-4hznvvz6: error getting snapshot snapshot-copy-4hznvvz6 from api server: volumesnapshots.snapshot.storage.k8s.io "snapshot-copy-4hznvvz6" not found

I0207 08:53:52.691074 1 controller.go:1359] provision "kasten-io/kanister-pvc-jr9vs" class "ceph-filesystem": started

 

When the export backup finally appeared to be using the shallow clone storage class I got:

failed to provision volume with StorageClass "shallow-cephfs-csi-storage-class": rpc error: code = InvalidArgument desc = cannot set pool for snapshot-backed volume

So I am cloning the storageclass without a pool? O.o

(Ref for anyone else: https://github.com/ceph/ceph-csi/issues/3820)

There’s no docs on this.

  • Will try and recreate without mentioning a pool………...

 

--- Edit:

After cloning the storageclass without a pool as per https://github.com/ceph/ceph-csi/issues/3820, annotated the original storageclass - kubectl annotate storageclass ceph-filesystem k10.kasten.io/sc-preserve-selinux-options="true", new cloned storage class with new params backingSnapshot: "true"...

Backup policy to include:

        exportData:
          enabled: true
          overrides:
            - storageClassName: ceph-filesystem
              enabled: true
              exporterStorageClassName: shallow-cephfs-csi-storage-class

 

Now the export of the snapshot appears to have the PVC in the kasten-io namespace. Will update when/if it copies the data to external storage location.

Userlevel 6
Badge +2

Seems to have been fixed in 3.10 version of ceph CSI

https://github.com/ceph/ceph-csi/releases/tag/v3.10.0

Comment