Skip to main content

Hello 
Kasten has been successfully creating backups for volumes and configs but recently export jobs failing for all policies with error message “client rate limiter Wait returned an error: context deadline exceeded":

 

cause:
message: "client rate limiter Wait returned an error: rate: Wait(n=1) would
exceed context deadline"
file: github.com/kanisterio/kanister@v0.0.0-20240828182737-b6d930f12c93/pkg/kube/pod.go
function: github.com/kanisterio/kanister/pkg/kube.WaitForPodReady
linenumber: 412
message: Pod did not transition into running state.
Timeout:15m0s Namespace:kasten-io,
Name:copy-vol-data-7d5hj
file: github.com/kanisterio/kanister@v0.0.0-20240828182737-b6d930f12c93/pkg/kube/pod_controller.go
function: github.com/kanisterio/kanister/pkg/kube.(*podController).WaitForPodReady
linenumber: 174
message: Pod failed to become ready in time
fields:
- name: pod
value: copy-vol-data-7d5hj
- name: namespace
value: kasten-io
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:304
function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func2
linenumber: 304
message: failed while waiting for Pod to be ready
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:161
function: kasten.io/k10/kio/kanister/function.CopyVolumeData
linenumber: 161
message: Failed to execute copy volume data pod function

Tried running the job again but same results.
Also I can see lots snapshots data like 1.2 TB. It should clear itself but looks like it not deleting snapshots. Not sure export jobs are failing because of this or could be something else

 



I can see some snapshots are being deleted as seen in graph but keep growing exponentially

 

Hi @msaeed ,

Thanks for reaching out to Veeam Kasten Community.

 

Based on the error message the copy-vol-data-x pod is pending (not ready state) and we are hitting our 15 minutes pod ready wait timeout. There could be multiple reasons the pod is not ready . Either waiting on your CSI Driver for the volume creation or troulbe mounting or trouble scheduling .

Error Message:-
"client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline". Pod did not transition into running state.

Next Steps:- Please provide the following
- kubectl describe po <copy vol data pod> -n kasten-io    (When the export action is kicked in you will see the copy-vol-data-x pod in kasten-io namespace)
- kubectl get pvc -n kasten-io (check for any pvc in pending state or recently created ). if yes, provide the describe output of it .

- kubectl get ev -n kasten-io

 

For Snapshots being not cleared .
- We wait for a successful run of the policy for retire action to takes place . Since your exports are failing , the local restorepoints are not cleared . Upon successful export action, this will automatically cleanup or if you would like you can manually retire those .

 

Regards
Satish


Hi Satish 

Thanks for responding. 

I have updated the CSI driver and after I can see export jobs have been completed. 

However we cannot restore any of the application. 

I have tried restoring a namespace with all its resources after deleting it completely from cluster. 

But it failed with errors:

 

- cause:
cause:
cause:
cause:
cause:
message: 'Operation cannot be fulfilled on services "goco-devops-mongodb-svc":
the object has been modified; please apply your changes to the
latest version and try again'
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:1876
function: kasten.io/k10/kio/exec/phases/phase.restoreStatefulSet
linenumber: 1876
message: Failed to restore service
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:918
function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreWorkloads
linenumber: 918
message: Failed to restore some of the workloads
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:398
function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreApp
linenumber: 398
message: Failed to restore workloads
file: kasten.io/k10/kio/exec/internal/runner/phase_runner.go:144
function: kasten.io/k10/kio/exec/internal/runner.(*phaseRunner).execPlannedPhase
linenumber: 144
message: Failure in planned phase
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
errors:
- cause:
cause:
message: persistentvolumeclaims "data-volume-goco-devops-mongodb-2" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
- cause:
cause:
message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-1" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
- cause:
cause:
message: persistentvolumeclaims "data-volume-goco-devops-mongodb-0" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
- cause:
cause:
message: persistentvolumeclaims "data-volume-goco-devops-mongodb-1" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
- cause:
cause:
message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-0" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
- cause:
cause:
message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-2" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
message: 6 errors have occurred
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2260
function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).transformToAppPVCs
linenumber: 2260
message: Failed to move some PVCs into application namespace
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:648
function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).createPVCsFromPVCSpecs
linenumber: 648
message: Failed to move restored PVCs into application namespace
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:378
function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreApp
linenumber: 378
message: Failed to create PVCs from PVC specs
file: kasten.io/k10/kio/exec/internal/runner/phase_runner.go:144
function: kasten.io/k10/kio/exec/internal/runner.(*phaseRunner).execPlannedPhase
linenumber: 144
message: Failure in planned phase
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
errors:
- cause:
cause:
message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-2" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
- cause:
cause:
message: persistentvolumeclaims "data-volume-goco-devops-mongodb-0" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
- cause:
cause:
message: persistentvolumeclaims "data-volume-goco-devops-mongodb-1" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
- cause:
cause:
message: persistentvolumeclaims "data-volume-goco-devops-mongodb-2" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
- cause:
cause:
message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-0" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
- cause:
cause:
message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-1" already
exists
file: kasten.io/k10/kio/pvccloner/clone.go:92
function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
linenumber: 92
message: Failed to create cloned PVC in target namespace
fields:
- name: AppNS
value: goco-devops-mongodb
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
linenumber: 2812
message: Failed moving PVC from K10 namespace to app namespace
message: 6 errors have occurred
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2260
function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).transformToAppPVCs
linenumber: 2260
message: Failed to move some PVCs into application namespace
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:648
function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).createPVCsFromPVCSpecs
linenumber: 648
message: Failed to move restored PVCs into application namespace
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:378
function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreApp
linenumber: 378
message: Failed to create PVCs from PVC specs
file: kasten.io/k10/kio/exec/internal/runner/phase_runner.go:144
function: kasten.io/k10/kio/exec/internal/runner.(*phaseRunner).execPlannedPhase
linenumber: 144
message: Failure in planned phase
message: Job failed to be executed


I double checked that all pvcs of that namespace were deleted whilst deleting that namespace but it complains about pvcs already exist.

​​​​​​​


Tried restoring a different app but same results 

 

          cause:
message: 'Operation cannot be fulfilled on services "goco-redis-headless": the
object has been modified; please apply your changes to the latest
version and try again'
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:1876
function: kasten.io/k10/kio/exec/phases/phase.restoreStatefulSet
linenumber: 1876
message: Failed to restore service
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:918
function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreWorkloads
linenumber: 918
message: Failed to restore some of the workloads
file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:398
function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreApp
linenumber: 398
message: Failed to restore workloads
file: kasten.io/k10/kio/exec/internal/runner/phase_runner.go:144
function: kasten.io/k10/kio/exec/internal/runner.(*phaseRunner).execPlannedPhase
linenumber: 144
message: Failure in planned phase
message: Job failed to be executed

 


Hi @msaeed ,

We usually see this error in operator based apps . I would check if your operator is overwriting something during the restore which is conflicting and eventually failing .

or in another case if you have any CI tool which is conflicting .

Remove the associated instance from your operator and then try a restore .

Regards
Satish

 


Hi Satish 

currently for this job we dont have an operator. looks like  as soon as restore starts, resource or service gets created and as soon as it gets amended by any process, restore fails there. This restore was done after deleting the namespace completely. 

Also I am facing the issue again to not to be able to perform export operations during a backup job for pvcs. 

 

Export job fails after data mover pod gives error as

The node was low on resource: ephemeral-storage. Threshold quantity: 2361707759, available: 2246072Ki. Container container was using 4690720Ki, request is 0, has larger consumption of ephemeral-storage.Container runtime did not kill the pod within specified grace period.


The node where data mover pod gets created significantly changes the /var volume up and down. went upto like 97% and back to 60% and then keeps doing same and then eventually returns that error. 

I can see in the node at /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io, kasten creating and deleting some kopia indexes which are taking the diskspace. 

I have increased the /var upto 100G but still its goes over 90% during the export job for this app with 16 pvcs. In total object data would be around 30G collectively on these pvcs which is not that much that kasten data mover will need 100G of ephemeral-storage or may be thats how it works. however total size of pvcs is around 340G. 

Also I can see in the node kopia processes using like 260% of the cpu
Currently node has 20 cpu set 


Error message in export job in the dashboard:

message: "command terminated with exit code 137.

stdout:\

stderr: / 4 hashing, 376198 hashed (523.9 MB),
425989 cached (690.9 MB), uploaded 772.2 MB,
estimated 2.1 GB (57.8%) 23m45s left\r

- 1 hashing, 377946 hashed (525.5 MB), 425989 cached
(690.9 MB), uploaded 775.2 MB, estimated 2.1 GB
(57.9%) 23m44s left\r

\\ 5 hashing, 380549 hashed (528.2 MB), 425989
cached (690.9 MB), uploaded 779.8 MB, estimated 2.1
GB (58.1%) 23m41s left\r

| 0 hashing, 384697 hashed (533.2 MB), 425989 cached
(690.9 MB), uploaded 787.9 MB, estimated 2.1 GB
(58.3%) 23m30s left\r

/ 3 hashing, 388686 hashed (537.9 MB), 425989 cached
(690.9 MB), uploaded 795.7 MB, estimated 2.1 GB
(58.5%) 23m20s left\r

- 4 hashing, 392117 hashed (541.9 MB), 425989 cached
(690.9 MB), uploaded 802.3 MB, estimated 2.1 GB
(58.7%) 23m13s left\r

\\ 0 hashing, 394129 hashed (544.3 MB), 425989
cached (690.9 MB), uploaded 806.2 MB, estimated 2.1
GB (58.8%) 23m10s left\r

| 5 hashing, 398164 hashed (549.1 MB), 425989 cached
(690.9 MB), uploaded 814 MB, estimated 2.1 GB
(59.0%) 23m0s left\r

/ 4 hashing, 400912 hashed (552.3 MB), 425989 cached
(690.9 MB), uploaded 819.3 MB, estimated 2.1 GB
(59.2%) 22m55s left\r

- 2 hashing, 404228 hashed (556.7 MB), 425989 cached
(690.9 MB), uploaded 826.2 MB, estimated 2.1 GB
(59.4%) 22m47s left"
file: github.com/kanisterio/kanister@v0.0.0-20240828182737-b6d930f12c93/pkg/kube/exec.go
function: github.com/kanisterio/kanister/pkg/kube.ExecWithOptions
linenumber: 156
message: Failed to exec command in pod
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:379
function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func2
linenumber: 379
message: "Failed to create and upload backup: kanister-tools container ran out
of memory"

 


@Satish 


 

Following case is being addressed in a support case opened by the customer . We have received the logs and reviewing it .

Regards
Satish


Comment