Skip to main content

Hi.

So, I have got an issue with my K8 cluster where Kasten K10 has made snapshots, and then I have done an ETCD restore to a time before these were created, so the VolumeSnapshots/VolumeSnapshotsContent record are not present for Ceph RBD snapshots that exist in my RBD pool.

This leaves orphaned snapshots in my Ceph cluster that I can’t easily remove.

 

Does anyone have any ideas how I can find and remove these orphaned images?

I was thinking:

Within K8 Storage → Persistent Volumes, output all Persistent  Volumes, grep the section:

 

 csi:
controllerExpandSecretRef:
name: rook-csi-rbd-provisioner
namespace: rook-ceph
driver: rook-ceph.rbd.csi.ceph.com
fsType: ext4
nodeStageSecretRef:
name: rook-csi-rbd-node
namespace: rook-ceph
volumeAttributes:
clusterID: rook-ceph
imageFeatures: layering
imageFormat: "2"
imageName: csi-vol-cdb2ac07-6a0b-11ed-87d9-2e16f5b6210e
journalPool: replicapool
pool: replicapool
storage.kubernetes.io/csiProvisionerIdentity: 1669022460388-8081-rook-ceph.rbd.csi.ceph.com
volumeHandle: 0001-0009-rook-ceph-0000000000000001-cdb2ac07-6a0b-11ed-87d9-2e16f5b6210e

Specifically interested in: (imageName): E.G csi-vol-cdb2ac07-6a0b-11ed-87d9-2e16f5b6210e

That would be an image in Ceph → Block → Images

 

The end result, remove all current Kasten K10 Snapshots in K8 VolumeSnapshots/VolumeSnapshotsContent, get a list of all Ceph RBD images in the pool, get all the objects with only the imageName line, match/remove ones that do not exist in Persistent Volumes - those would be the orphaned images?

Am I correct? Is there an easier way to do this?

@jaiganeshjk 


@voarsh Thanks for posting the question.

 

You will have to get in a toolbox pod (https://rook.io/docs/rook/v1.10/Troubleshooting/ceph-toolbox/) and run rbd cli commands.

Please refer this documentation https://docs.ceph.com/en/mimic/rbd/rados-rbd-cmds/ for RBD commands.

 

I don’t think you would need to remove the rbd images.
You will have to run through a for loop for listing all the snapshots for each rbd image and remove those snapshots.

Below is the list of commands that might come handy for you.

#list all the images for a pool

rbd ls {poolname}

#list all the snapshots for an image

rbd snap ls {pool-name}/{image-name}

#remove a snapshot with the image and pool name as input
rbd snap rm {pool-name}/{image-name}@{snap-name}



#To delete all snapshots for an image with rbd, specify the snap purge option and the image name.

rbd snap purge {pool-name}/{image-name}

 


@voarsh Thanks for posting the question.

 

You will have to get in a toolbox pod (https://rook.io/docs/rook/v1.10/Troubleshooting/ceph-toolbox/) and run rbd cli commands.

Please refer this documentation https://docs.ceph.com/en/mimic/rbd/rados-rbd-cmds/ for RBD commands.

 

I don’t think you would need to remove the rbd images.
You will have to run through a for loop for listing all the snapshots for each rbd image and remove those snapshots.

Below is the list of commands that might come handy for you.

#list all the images for a pool

rbd ls {poolname}

#list all the snapshots for an image

rbd snap ls {pool-name}/{image-name}

#remove a snapshot with the image and pool name as input
rbd snap rm {pool-name}/{image-name}@{snap-name}



#To delete all snapshots for an image with rbd, specify the snap purge option and the image name.

rbd snap purge {pool-name}/{image-name}

 

Thanks for your reply.

 

I put together this example:

for x in $(rbd list --pool replicapool); do
rbd snap purge replicapool/$x
done

I haven’t ran it yet, but I believe it should delete all snaps under the images.


Yes @voarsh That should purge all the snapshots.


I had a quick question around purging images.

See, I’ve backed up, restored, and backed up, restored, creating images from snapshot after snapshot over the usage - there hasn’t been any image flattening, because that would be up to if Kasten K10 does flattening.

 

I’m wondering if this will mean I still have old images from restored volumes (that came from a different Ceph Pool, for example)


So, I finally ran:
for x in $(rbd list --pool replicapool); do rbd snap purge replicapool/$x done

 

And upon checking images in the Ceph Dashboard I still have like 800 images, when I only have something like 80 PVC’s, it’s not deleting all the images? Could this be because the volumes have snaps and aren’t flattened?

 

Keep in mind, for all my Kasten K10 backups, I am keeping 0 snapshots to be kept in Ceph.


@voarsh Do you mean, you still have a lot of stale RBD images that were created by K10 ?

 

K10 integrates with the CSI driver to create/remove snapshots and images when the corresponding volumesnapshot and PVC resources are created.

If there are RBD images left over in your cluster, I suppose there is an issue with the cleanup with RBD images when the PVCs are deleted. 
 

Also, Verify the deletion policy for your storageClass and snapshotClass and see if it is set to `Delete`.

Setting it to retain, may not delete the images when K10 deletes the corresponding PVC/PV pairs.


Comment