Hi! I’m using a NFS share for K10 DR. The retention policy in the DR policy was changed to only keep 1 hourly version. Therfore only one version should be kept. It looks liket the old version get retired. But the space on the nfs volume isn’t freed up. Also the usage report shows that the storage is in use by kasten-io. What can I do to free up the space? Is there any manual disk space reaclaim job which needs to be run?
Hi ph1l1pp,
If you don’t need to keep the old backups it might be better to delete the policy and create a new one. I am not using NFS but with S3 if I delete a policy and then run my report again it claims that it is gone:
wait wrong screen shot ;)
Actually I took my S3 offline last night in the basement.. bear with me :)
Ok while waiting I did find this. There could be issues with permissions but try looking around the CLI too.
Take a look at this:
https://docs.kasten.io/latest/api/restorepoints.html#api-delete-rpc
Ok so looks like this is the case after all. I deleted the policy and recreated but the old restore points are still there so next I will try to manually remove and explore the api to see if there is a purge of some sort.
I deleted the policy and rerun the report. Still shows me the same amount of data:
After that I deleted all restore points, which triggered a retire action for all remaining restore points:
kubectl delete restorepointcontents.apps.kio.kasten.io --selector=k10.kasten.io/appName=kasten-io
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-nm28p" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-8bb5w" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-lw55l" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-pnnjx" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-vf5sw" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-mhvvs" deleted
restorepointcontent.apps.kio.kasten.io "kasten-io-scheduled-bjnzh" deleted
After a while the retire actions get stoped with state failed:
status:
actionDetails: {}
endTime: "2022-01-13T12:14:23Z"
error:
cause: '{"cause":{"cause":{"cause":{"cause":{"Code":1,"Err":{}},"function":"kasten.io/k10/kio/kanister/function.deleteDataPodExecFunc.func1","linenumber":156,"message":"Error
executing kopia GC"},"function":"kasten.io/k10/kio/kanister/function.DeleteData","linenumber":92,"message":"Failed
to execute delete data pod function"},"function":"kasten.io/k10/kio/exec/phases/phase.GenericVolumeSnapshotDelete","linenumber":676,"message":"Failed
to delete Generic Volume Snapshot data"},"function":"kasten.io/k10/kio/exec/phases/phase.(*retireRestorePointPhase).retireGenericVolumeSnapshots","linenumber":435,"message":"Failed
to retire some of the generic volume snapshots"}'
message: Job failed to be executed
plan: {}
restorePoint:
name: ""
result:
name: ""
startTime: "2022-01-13T11:40:39Z"
state: Failed
Problem still persists. Any idea what causes this kopia error?
Hi ph1l1pp,
You might need someone from Kasten to check that since it is at the kopia level. You can put a support case in even if you are using the communion edition.
cheers
Hi again, so I went through all the steps (manually deleting etc) and I saw a result right away in the report.
So did manual deletions:
And ran the report on demand before and after:
So immediately the report reflects the change.
That kopia error that you found is obviously the cause. Please get back to us if Kasten support figure it out so we can be aware if it pops up.
cheers
Geoff
Hello
Thank you for using our community and K10.
Could you please raise a ticket to K10 support at http://my.veeam.com/, to speed up the troubleshooting, please add to your case the debug logs:
https://docs.kasten.io/latest/operating/support.html#gathering-debugging-information
Best Regards
Fernando R.
After freeing up some space manually the kopia error disappeared. Looks like this one was caused by the full disk. But the root cause still persists. The volume starts filling up again and old data isn’t removed.
After freeing up some space manually the kopia error disappeared. Looks like this one was caused by the full disk. But the root cause still persists. The volume starts filling up again and old data isn’t removed.
Ah ok that is good to know. Probably the logs were not clear. I have seen that before when logs lead people on a wild goose chase and then it turns out to be something like space or permissions . I am noting this down in case I see it too, thanks.
Hello Philipp,
Please check the reply of your case.
I have checked the logs that sent from your side, currently retire actions could not be reclaim space from the external repositories ( NFS volume )
retire actions or manually deleting the restorepoints either from the backend or kasten dashboard will free the local spaces in the cluster but not outside the cluster.
i checked the screenshot and i found that all the sanpshot and data were removed but the only data exists is the kasten services data ( resources data ) which is used by loggin, catalog and jobs objects or resrources in kasten-io namesapce
for more details about this data you can check "more chart and alerts" option under "usage and reports page" in kasten dashboard.
back again to your first inquiry if there is any way to remove the data from NFS volume which retire action triggered , unfortunatley we dont have such option at the moment but It is a good feature that could be possible in the coming versions.
I would like to emphasize that the retire action or manually deleting restore point will clean the local space in the cluster ( e.g revmoing snapshots ).
Please let us know if you still have some inquiries or more clarifications needed from our side.
Best Regards
Ahmed Hagag
Hi Ahmed,
Is this the same for Object Storage as well? i.e. the data won’t retire automatically?
Thanks
We are facing the same behavior on our NFS location profile.
Hello
Sorry for the confusion , retire actions should remove the data from external repository ( regardless it is object/NFS storage ) and it seems something wrong from our side
we are checking with the engineering team how can it be fixed and will come back to you soon.
Thanks
Ahmed Hagag
Hello
Sorry for the confusion , retire actions should remove the data from external repository ( regardless it is object/NFS storage ) and it seems something wrong from our side
we are checking with the engineering team how can it be fixed and will come back to you soon.
Thanks
Ahmed Hagag
Thanks Ahmed. It will be interesting to see. I will try again later with the latest release as well.
cheers
Hello
Sorry for the confusion , retire actions should remove the data from external repository ( regardless it is object/NFS storage ) and it seems something wrong from our side
we are checking with the engineering team how can it be fixed and will come back to you soon.
Thanks
Ahmed Hagag
Hello
Hi
we are still checking the case ,but let me explain a bit how a Kopia tool that we are using for fast and secure backups and why such issue exists.
Kopia marks the blobs of the snapshot as deleted but not delete them until garbage collections runs,It seems the garabage collections runs after 1 or 2 days and if there is a backup before that which references the same blob that were marked deleted, they will be unmarked and rest will be removed.
it is kind of fail safe mechanism used by Kpoia tool but we are still testing that.
could you clarify what is the status of the policy you are using for backup ? because if you paused it , the kopia mechnism that i mentioned wont work.
and as i mentioned above we are in the testing phase to be able to confirm what is the root cause of this issue!
We tried with active policies, paused policies but also with manual export and the result is the same each time, the data is not deleted from the NFS storage.
Hello, I have a similar issue with the NFS mount FileStore policy.
I wasnt able to find any documentation relating to this matter, however I discovered that the permissions of the create-repo-pod didn’t seem to match up with my NFS export permissions:
```
cause:
cause:
cause:
cause:
Code: 1
Err: {}
file: kasten.io/k10/kio/kopia/repository.go:528
function: kasten.io/k10/kio/kopia.ConnectToKopiaRepository
linenumber: 528
message: Failed to connect to the backup repository
fields:
- name: appNamespace
value: backup
file: kasten.io/k10/kio/exec/phases/phase/export.go:210
function: kasten.io/k10/kio/exec/phases/phase.prepareKopiaRepoIfExportingData
linenumber: 210
message: Failed to create Kopia repository for data export
file: kasten.io/k10/kio/exec/phases/phase/export.go:132
function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
linenumber: 132
message: Failed to copy artifacts
message: Job failed to be executed
fields: >]
```
Additionally, changing my NFS directory to be chmod 777 allowed the repository to be created and written, however it writes it as nobody/nogroup. When I attempt to delete snapshots/exports it fails to delete the NFS exported files (due to permissions). Only after chmoding the directory to be 777 did it allow deletion.
Is there any way to configure the `kopia` command to provide gid/uid https://kopia.io/docs/reference/command-line/common/repository-connect-filesystem/?
```
c s0] tcp.0: e1646973928.325036348, {"Command"=>"kopia --log-level=error --config-file=/tmp/kopia-repository --log-dir=/tmp/kopia-log --password=<****> repository connect --no-check-for-updates --cache-directory=/tmp/kopia-cache --content-cache-size-mb=0 --metadata-cache-size-mb=500 --override-hostname=create-repo-pod --override-username=k10-admin filesystem --path=/mnt/data/default/repo/662b98ec-f136-4957-a100-2075a642f128/", "File"=>"kasten.io/k10/kio/kopia/kopia.go", "Function"=>"kasten.io/k10/kio/kopia.stringSliceCommand", "Level"=>"info", "Line"=>132, "Message"=>"kopia command", "Time"=>"2022-03-11T04:45:28.109692023Z", "cluster_name"=>"625a2a72-f41b-4697-8b55-7ab69805021f", "hostname"=>"executor-svc-b549cbdfc-jkvpd", "version"=>"4.5.10"}]
```
workaround to run maintenance on NFS share. to clear external storage space.
Problem Description:
When the restorepoints for DR expires and are deleted (manually or retire actions) the space is not reclaimed from external storage (S3,NFS..) filling it up
Workaround/Resolution:
Open a support case
Hello
Hello folks,
We have similar problem using s3 storage.
k10: 5.5.6
Kanister: 0.89.0
Kopia: 0.12.1
Minio: 2023-03-13T19:46:17Z
Kubernetes: v1.24.9+rke2r2
Do you have some news?
Thank you.
We are still improving the maintenance process.
However, If you need help in running a maintenance for your S3, Please open up a case with us through my.veeam.com
Comment
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.