Hi! I’m using a NFS share for K10 DR. The retention policy in the DR policy was changed to only keep 1 hourly version. Therfore only one version should be kept. It looks liket the old version get retired. But the space on the nfs volume isn’t freed up. Also the usage report shows that the storage is in use by kasten-io. What can I do to free up the space? Is there any manual disk space reaclaim job which needs to be run?

K10 Disaster Recovery fills NFS share | Veeam Community Resource Hub

Userlevel 5

+2

Hagag
Experienced User
111 comments
2 years ago
2 February 2022

Hello @KelianSB @Geoff Burke

Sorry for the confusion , retire actions should remove the data from external repository ( regardless it is object/NFS storage ) and it seems something wrong from our side
we are checking with the engineering team how can it be fixed and will come back to you soon.

Thanks

Ahmed Hagag

Userlevel 4

+2

FRubens
Experienced User
90 comments
2 years ago
14 January 2022

Hello @ph1l1pp

Thank you for using our community and K10.

Could you please raise a ticket to K10 support at http://my.veeam.com/, to speed up the troubleshooting, please add to your case the debug logs:

https://docs.kasten.io/latest/operating/support.html#gathering-debugging-information

Best Regards

Fernando R.

Userlevel 5

+2

Hagag
Experienced User
111 comments
2 years ago
28 March 2022
Answer

Problem Description:

When the restorepoints for DR expires and are deleted (manually or retire actions) the space is not reclaimed from external storage (S3,NFS..) filling it up

Workaround/Resolution:

It is a bit tricky since NFS used as a location profile for the DR backup location.
the idea is to mount your NFS Share in the kanister-sidecar container in catalog-svc-xxxx-xxx POD

1- create PV and PVC in your kasten-io namespace , check the below example

cat test-nfs-pv2.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
name: test-pv-nfs2
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
mountOptions:
- hard
- nfsvers=4.1
nfs:
path: /mnt/backups
server: NFS-IP-address

cat test-nfs-pvc2.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc2
namespace: kasten-io
spec:
storageClassName: nfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi

2-patch the current catalog-svc deployment by creating yaml file similar to the below:

cat catalogsvc2.yaml

spec:
template:
spec:
volumes:
- name: nfs-storage1
persistentVolumeClaim:
claimName: nfs-pvc2
containers:
- name: kanister-sidecar
volumeMounts:
- mountPath: /mnt/backup
name: nfs-storage1

3- you can connect to the kopia repoistory from kanister-sidecare by issue commands similar to the below

- you need to change the path as you have it in your NFS Share

kopia repository connect filesystem --path /mnt/backup/k10/b8d3ead3-ea44-48a0-ade8-11a746e79f0b/migration/b8d3ead3-ea44-48a0-ade8-11a746e79f0b/k10/repo/

- it will request password and you can fetch the password by issue the below command

kubectl get secret -n kasten-io k10-dr-secret -o jsonpath='{.data.key}'|base64 -d

4- change the owner and run kopia full maintenance command , feel free to remove the --safety opetion.

kopia maintenance set --owner=me → please save the current owner information before run this command
kopia maintenance run --log-level=debug --full --safety=none

once you finish change the owner back for example

kopia maintenance set --owner=k10-admin@31593f98-45b4-41a1-afe8-8ce6291ed242-maintenance

Userlevel 6

+2

jaiganeshjk
Experienced User
239 comments
1 year ago
16 March 2023

We are still improving the maintenance process.

However, If you need help in running a maintenance for your S3, Please open up a case with us through my.veeam.com

K

Userlevel 1

KelianSB
Not a newbie anymore
7 comments
2 years ago
10 February 2022

Hello @KelianSB @Geoff Burke

Sorry for the confusion , retire actions should remove the data from external repository ( regardless it is object/NFS storage ) and it seems something wrong from our side
we are checking with the engineering team how can it be fixed and will come back to you soon.

Thanks

Ahmed Hagag

Hello @Hagag, thanks for your response. Were you finally able to identify the problem?

Userlevel 5

+2

Hagag
Experienced User
111 comments
2 years ago
10 February 2022

Hi @KelianSB

we are still checking the case ,but let me explain a bit how a Kopia tool that we are using for fast and secure backups and why such issue exists.

Kopia marks the blobs of the snapshot as deleted but not delete them until garbage collections runs,It seems the garabage collections runs after 1 or 2 days and if there is a backup before that which references the same blob that were marked deleted, they will be unmarked and rest will be removed.

it is kind of fail safe mechanism used by Kpoia tool but we are still testing that.

could you clarify what is the status of the policy you are using for backup ? because if you paused it , the kopia mechnism that i mentioned wont work.

and as i mentioned above we are in the testing phase to be able to confirm what is the root cause of this issue!

K

Userlevel 1

KelianSB
Not a newbie anymore
7 comments
2 years ago
10 February 2022

We tried with active policies, paused policies but also with manual export and the result is the same each time, the data is not deleted from the NFS storage.

P

Userlevel 1

ph1l1pp
Author
Not a newbie anymore
3 comments
2 years ago
17 January 2022

@Geoff Burke Thanks for going through all the steps on your s3 environment. That’s how I expect to be also on my environment. I openend now a support case.

After freeing up some space manually the kopia error disappeared. Looks like this one was caused by the full disk. But the root cause still persists. The volume starts filling up again and old data isn’t removed.

Userlevel 5

+2

Hagag
Experienced User
111 comments
2 years ago
19 January 2022

Hello Philipp,

Please check the reply of your case.

I have checked the logs that sent from your side, currently retire actions could not be reclaim space from the external repositories ( NFS volume )
retire actions or manually deleting the restorepoints either from the backend or kasten dashboard will free the local spaces in the cluster but not outside the cluster.

i checked the screenshot and i found that all the sanpshot and data were removed but the only data exists is the kasten services data ( resources data ) which is used by loggin, catalog and jobs objects or resrources in kasten-io namesapce
for more details about this data you can check "more chart and alerts" option under "usage and reports page" in kasten dashboard.

back again to your first inquiry if there is any way to remove the data from NFS volume which retire action triggered , unfortunatley we dont have such option at the moment but It is a good feature that could be possible in the coming versions.

I would like to emphasize that the retire action or manually deleting restore point will clean the local space in the cluster ( e.g revmoing snapshots ).

Please let us know if you still have some inquiries or more clarifications needed from our side.
Best Regards
Ahmed Hagag

K

Userlevel 1

KelianSB
Not a newbie anymore
7 comments
1 year ago
24 June 2022

Hello @Hagag, is this case resolved? I mean, without using the suggested workaround.

Userlevel 5

+2

Hagag
Experienced User
111 comments
1 year ago
27 June 2022

@KelianSB still, the fix is in progress.

K

Userlevel 1

KelianSB
Not a newbie anymore
7 comments
2 years ago
31 January 2022

We are facing the same behavior on our NFS location profile. @Hagag you said that it is the expected behavior, i.e. exported restore points are never deleted from Object/NFS storage ?

Userlevel 7

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1225 comments
2 years ago
2 February 2022

Hello @KelianSB @Geoff Burke

Sorry for the confusion , retire actions should remove the data from external repository ( regardless it is object/NFS storage ) and it seems something wrong from our side
we are checking with the engineering team how can it be fixed and will come back to you soon.

Thanks

Ahmed Hagag

Thanks Ahmed. It will be interesting to see. I will try again later with the latest release as well.

cheers

E

Dear all,

Any updates on this ?

Still issue with NFS Store with the latest release of Kasten.

Best Regards, Edouard Fazenda.

D

DevSecOps
New Here
1 comment
1 year ago
15 March 2023

Hello folks,

We have similar problem using s3 storage.

k10: 5.5.6

Kanister: 0.89.0

Kopia: 0.12.1

Minio: 2023-03-13T19:46:17Z

Kubernetes: v1.24.9+rke2r2

Do you have some news?

Thank you.

Userlevel 7

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1225 comments
2 years ago
12 January 2022

Hi ph1l1pp,

If you don’t need to keep the old backups it might be better to delete the policy and create a new one. I am not using NFS but with S3 if I delete a policy and then run my report again it claims that it is gone:

Userlevel 7

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1225 comments
2 years ago
12 January 2022

wait wrong screen shot ;)

Userlevel 7

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1225 comments
2 years ago
12 January 2022

Actually I took my S3 offline last night in the basement.. bear with me :)

Userlevel 7

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1225 comments
2 years ago
12 January 2022

Ok while waiting I did find this. There could be issues with permissions but try looking around the CLI too.

Take a look at this:

https://docs.kasten.io/latest/api/restorepoints.html#api-delete-rpc

Userlevel 7

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1225 comments
2 years ago
12 January 2022

Ok so looks like this is the case after all. I deleted the policy and recreated but the old restore points are still there so next I will try to manually remove and explore the api to see if there is a purge of some sort.

P

Userlevel 1

ph1l1pp
Author
Not a newbie anymore
3 comments
2 years ago
13 January 2022

I deleted the policy and rerun the report. Still shows me the same amount of data:

After that I deleted all restore points, which triggered a retire action for all remaining restore points:

kubectl delete restorepointcontents.apps.kio.kasten.io --selector=k10.kasten.io/appName=kasten-io
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-nm28p" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-8bb5w" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-lw55l" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-pnnjx" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-vf5sw" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-mhvvs" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-bjnzh" deleted

After a while the retire actions get stoped with state failed:

status:
actionDetails: {}
endTime: "2022-01-13T12:14:23Z"
error:
    cause: '{"cause":{"cause":{"cause":{"cause":{"Code":1,"Err":{}},"function":"kasten.io/k10/kio/kanister/function.deleteDataPodExecFunc.func1","linenumber":156,"message":"Error
      executing kopia GC"},"function":"kasten.io/k10/kio/kanister/function.DeleteData","linenumber":92,"message":"Failed
      to execute delete data pod function"},"function":"kasten.io/k10/kio/exec/phases/phase.GenericVolumeSnapshotDelete","linenumber":676,"message":"Failed
      to delete Generic Volume Snapshot data"},"function":"kasten.io/k10/kio/exec/phases/phase.(*retireRestorePointPhase).retireGenericVolumeSnapshots","linenumber":435,"message":"Failed
      to retire some of the generic volume snapshots"}'
    message: Job failed to be executed
plan: {}
restorePoint:
    name: ""
result:
    name: ""
startTime: "2022-01-13T11:40:39Z"
state: Failed

Problem still persists. Any idea what causes this kopia error?

Userlevel 7

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1225 comments
2 years ago
13 January 2022

Hi ph1l1pp,

You might need someone from Kasten to check that since it is at the kopia level. You can put a support case in even if you are using the communion edition.

cheers

Userlevel 7

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1225 comments
2 years ago
14 January 2022

Hi again, so I went through all the steps (manually deleting etc) and I saw a result right away in the report.

So did manual deletions:

And ran the report on demand before and after:

So immediately the report reflects the change.

That kopia error that you found is obviously the cause. Please get back to us if Kasten support figure it out so we can be aware if it pops up.

cheers

Geoff

Userlevel 7

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1225 comments
2 years ago
17 January 2022

@Geoff BurkeThanks for going through all the steps on your s3 environment. That’s how I expect to be also on my environment. I openend now a support case.

After freeing up some space manually the kopia error disappeared. Looks like this one was caused by the full disk. But the root cause still persists. The volume starts filling up again and old data isn’t removed.

Ah ok that is good to know. Probably the logs were not clear. I have seen that before when logs lead people on a wild goose chase and then it turns out to be something like space or permissions :). I am noting this down in case I see it too, thanks.

Userlevel 7

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1225 comments
2 years ago
19 January 2022

Hi Ahmed,

Is this the same for Object Storage as well? i.e. the data won’t retire automatically?

Thanks

K10 Disaster Recovery fills NFS share

26 comments

Comment

workaround to run kopia full maintenance on NFS share. to clear external storage space.

Comment

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded