Solved

VolumeSnapshot and VolumeSnapshotContent not deleted using rook-ceph


Hi,

 

I’m currently using K10 in a “no-local, all S3” policy, meaning that every local snapshot made for backups has to be immediatley deleted once it has been exported to S3, as seen here :

 

I have no problems with backups whatsoever, but I do have a pile (1700+) of VolumeSnapshots that are not deleted.

 

Every VC looks like :

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
annotations:
snapshot.storage.kubernetes.io/volumesnapshot-being-deleted: "yes"
creationTimestamp: "2024-03-09T00:03:40Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2024-03-09T00:04:25Z"
finalizers:
- snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
generation: 3
name: snapshot-copy-2b4pthg4-content-f019e565-697e-4649-b797-d461f4c18544
resourceVersion: "175949429"
uid: c0234817-0c1f-4e7d-8969-44fb7a83d5d5
spec:
deletionPolicy: Delete
driver: rook-ceph.rbd.csi.ceph.com
source:
snapshotHandle: 0001-0009-rook-ceph-0000000000000002-8eda267e-e1a2-4242-8836-92df0dbeb325
volumeSnapshotClassName: k10-clone-csi-rbdplugin-snapclass
volumeSnapshotRef:
kind: VolumeSnapshot
name: snapshot-copy-2b4pthg4
namespace: kasten-io
uid: ecfa59d1-7353-4a39-af09-ca304808e9a4
status:
creationTime: 1709942621836570796
readyToUse: true
restoreSize: 0
snapshotHandle: 0001-0009-rook-ceph-0000000000000002-8eda267e-e1a2-4242-8836-92df0dbeb325

 

Regarding Ceph, the only relevant logs are of course the csi-snapshotter container, which tells :

...
E0419 11:21:58.273585 1 snapshot_controller_base.go:359] could not sync content "snapshot-copy-9jhzn7fz-content-f7e30f08-5d76-4643-9d3f-4604441fb283": failed to delete snapshot "snapshot-copy-9jhzn7fz-content-f7e30f08-5d76-4643-9d3f-4604441fb283", err: failed to delete snapshot content snapshot-copy-9jhzn7fz-content-f7e30f08-5d76-4643-9d3f-4604441fb283: "rpc error: code = InvalidArgument desc = provided secret is empty"
I0419 11:21:58.273685 1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshotContent", Namespace:"", Name:"snapshot-copy-9jhzn7fz-content-f7e30f08-5d76-4643-9d3f-4604441fb283", UID:"a66cd49d-68ed-4bb3-aa26-efc56e6668ea", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"204560594", FieldPath:""}): type: 'Warning' reason: 'SnapshotDeleteError' Failed to delete snapshot
...

 

StorageClass and VolumeSnapshotClass associated with rook-ceph both have secret location annotations. 

 

I do have backups both for RBD and CephFS, and both plugin pods have the same kind of logs

I can provide logs from k10_debug.sh tool if needed.

 

Any help appreciated, thanks a lot !

 

 

icon

Best answer by lgromb 22 April 2024, 11:41

View original

2 comments

Userlevel 5
Badge +2

Hi @lgromb 
 

It seems there may be an issue originating from the Ceph side, as the responsibility for managing snapshots lies with the Ceph csi-snapshotter. K10 merely requests deletion from the CSI snapshotter. I recommend reaching out to the Ceph or storage team to investigate why errors occur during snapshot deletion.

you can also try to manually create and delete the snapshot without involving K10 and check if you have the same issue.
 

Additionally, I came across a link to a bug report detailing a similar issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1951399


BR,
Ahmed Hagag

Hi @Hagag

 

You’re absolutely right. I did managed to delete “1” VSC by adding those annotations to it :

 

    snapshot.storage.kubernetes.io/deletion-secret-name: rook-csi-rbd-provisioner
snapshot.storage.kubernetes.io/deletion-secret-namespace: rook-ceph

 

Now I need to find a quick and dirty way to apply those annotations to all snapshots…

 

Thanks for your help !

Comment