Question

Snapshots lying around in Ceph trash

1 year ago
26 February 2023
3 comments
138 views

Userlevel 3

voarsh
Comes here often
14 comments

I recently purged everything in my Ceph cluster, started fresh…..

And I have backups going on, but I’m finding that in 2 weeks of this new install that the number of “trash” snaps are going up. It was first 6, now 20… I can’t delete them…..

I haven’t done a single restore via Kasten, so I don’t understand why these images aren’t deleting (should be no volumes/image dependencies from a restore)…..

Nevertheless purging gives: “RBD image has snapshots (error deleting image from trash)”

My retention is basically keep 1 hourly backup, the rest are sent to external storage outside of Ceph.

Does Kasten have bugs with snapshots? Before I started fresh, restoring PVC’s, making up the clones, restoring, would result in loads of unclearable snaps inside Ceph as well….

3 comments

Userlevel 5

+2

Hagag
Experienced User
116 comments
1 year ago
27 February 2023

Hello @voarsh Try to have a look into the status of the retire-actions that get created and check if it is failing!

kubectl get retireactions.actions.kio.kasten.io -o custom-columns=NAME:.metadata.name,"CREATED AT":.metadata.creationTimestamp,STATUS:.status.state

and what is the deletion policy specified in the volumesnapshot class(that the volumesnapshot uses)?

Ahmed Hagag

V

Userlevel 3

voarsh
Author
Comes here often
14 comments
1 year ago
1 March 2023

Argh. My reply was over the char limit….

An example of a stuck retire job:

kubectl get --raw /apis/actions.kio.kasten.io/v1alpha1/retireactions/retire-r6qdh8jdps/details

{"kind":"RetireAction","apiVersion":"actions.kio.kasten.io/v1alpha1","metadata":{"name":"retire-r6qdh8jdps","uid":"3a637be2-b801-11ed-874e-36f9cb9e1653","resourceVersion":"687634","creationTimestamp":"2023-03-01T07:18:13Z","labels":{"k10.kasten.io/policyName":"chatwoot-backup","k10.kasten.io/policyNamespace":"kasten-io"}},"status":{"state":"Running","startTime":"2023-03-01T07:18:13Z","endTime":null,"restorePoint":{"name":""},"result":{"name":""},"actionDetails":{"phases":[{"endTime":null,"name":"Retiring RestorePoint","startTime":"2023-03-01T07:18:13Z","state":"waiting","updatedTime":"2023-03-01T11:25:13Z"}]},"progress":14},"spec":{"subject":{"apiVersion":"internals.kio.kasten.io/v1alpha1","kind":"Manifest","name":"59c2ae22-b7f6-11ed-874e-36f9cb9e1653"},"scheduledTime":"2023-03-01T07:00:00Z","retireIndex":614}}

I have hundreds of backups in the UI that won’t retire.

I had manually deleted all the contents of VolumeSnapshots… since Kasten won’t retire them...

V

Userlevel 3

voarsh
Author
Comes here often
14 comments
1 year ago
1 March 2023

and what is the deletion policy specified in the volumesnapshot class(that the volumesnapshot uses)?

ceph-block (delete), and k10-clone-ceph-block (retain)… though, looking at new snaps in VolumeSnapshots, the class seems to be using ceph-block, can’t speak for the 200+ that I had deleted earlier, assume it was the same.

As it currently stands, I have 27 snaps in Ceph trash that I can’t delete… I would need to manually dig into Ceph to see what the parent image is, is it safe to delete all child snaps… because Kasten has definitely messed up I think.

kubectl get retireactions.actions.kio.kasten.io -o custom-columns=NAME:.metadata.name,"CREATED AT":.metadata.creationTimestamp,STATUS:.status.state

And for sure this shows some failed, not much reasons on why, the output is way too large to post here, and doesn’t offer much detail

Comment

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded