Skip to main content
Solved

creating volume backup failed


  • Comes here often
  • 8 comments

Hello 
Kasten has been successfully creating backups for volumes and configs but recently export jobs failing for all policies with error message “client rate limiter Wait returned an error: context deadline exceeded":

 

cause:
                            message: "client rate limiter Wait returned an error: rate: Wait(n=1) would
                              exceed context deadline"
                          file: github.com/kanisterio/kanister@v0.0.0-20240828182737-b6d930f12c93/pkg/kube/pod.go
                          function: github.com/kanisterio/kanister/pkg/kube.WaitForPodReady
                          linenumber: 412
                          message: Pod did not transition into running state.
                            Timeout:15m0s  Namespace:kasten-io,
                            Name:copy-vol-data-7d5hj
                        file: github.com/kanisterio/kanister@v0.0.0-20240828182737-b6d930f12c93/pkg/kube/pod_controller.go
                        function: github.com/kanisterio/kanister/pkg/kube.(*podController).WaitForPodReady
                        linenumber: 174
                        message: Pod failed to become ready in time
                      fields:
                        - name: pod
                          value: copy-vol-data-7d5hj
                        - name: namespace
                          value: kasten-io
                      file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:304
                      function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func2
                      linenumber: 304
                      message: failed while waiting for Pod to be ready
                    file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:161
                    function: kasten.io/k10/kio/kanister/function.CopyVolumeData
                    linenumber: 161
                    message: Failed to execute copy volume data pod function

Tried running the job again but same results.
Also I can see lots snapshots data like 1.2 TB. It should clear itself but looks like it not deleting snapshots. Not sure export jobs are failing because of this or could be something else

 



I can see some snapshots are being deleted as seen in graph but keep growing exponentially

 

Best answer by Satish

 

Following case is being addressed in a support case opened by the customer . We have received the logs and reviewing it .

Regards
Satish

View original
Did this topic help you find an answer to your question?

7 comments

Forum|alt.badge.img+1
  • Experienced User
  • 49 comments
  • September 18, 2024

Hi @msaeed ,

Thanks for reaching out to Veeam Kasten Community.

 

Based on the error message the copy-vol-data-x pod is pending (not ready state) and we are hitting our 15 minutes pod ready wait timeout. There could be multiple reasons the pod is not ready . Either waiting on your CSI Driver for the volume creation or troulbe mounting or trouble scheduling .

Error Message:-
"client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline". Pod did not transition into running state.

Next Steps:- Please provide the following
- kubectl describe po <copy vol data pod> -n kasten-io    (When the export action is kicked in you will see the copy-vol-data-x pod in kasten-io namespace)
- kubectl get pvc -n kasten-io (check for any pvc in pending state or recently created ). if yes, provide the describe output of it .

- kubectl get ev -n kasten-io

 

For Snapshots being not cleared .
- We wait for a successful run of the policy for retire action to takes place . Since your exports are failing , the local restorepoints are not cleared . Upon successful export action, this will automatically cleanup or if you would like you can manually retire those .

 

Regards
Satish


  • Author
  • Comes here often
  • 8 comments
  • September 18, 2024

Hi Satish 

Thanks for responding. 

I have updated the CSI driver and after I can see export jobs have been completed. 

However we cannot restore any of the application. 

I have tried restoring a namespace with all its resources after deleting it completely from cluster. 

But it failed with errors:

 

- cause:
    cause:
      cause:
        cause:
          cause:
            message: 'Operation cannot be fulfilled on services "goco-devops-mongodb-svc":
              the object has been modified; please apply your changes to the
              latest version and try again'
          file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:1876
          function: kasten.io/k10/kio/exec/phases/phase.restoreStatefulSet
          linenumber: 1876
          message: Failed to restore service
        file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:918
        function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreWorkloads
        linenumber: 918
        message: Failed to restore some of the workloads
      file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:398
      function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreApp
      linenumber: 398
      message: Failed to restore workloads
    file: kasten.io/k10/kio/exec/internal/runner/phase_runner.go:144
    function: kasten.io/k10/kio/exec/internal/runner.(*phaseRunner).execPlannedPhase
    linenumber: 144
    message: Failure in planned phase
  message: Job failed to be executed
- cause:
    cause:
      cause:
        cause:
          cause:
            errors:
              - cause:
                  cause:
                    message: persistentvolumeclaims "data-volume-goco-devops-mongodb-2" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
              - cause:
                  cause:
                    message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-1" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
              - cause:
                  cause:
                    message: persistentvolumeclaims "data-volume-goco-devops-mongodb-0" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
              - cause:
                  cause:
                    message: persistentvolumeclaims "data-volume-goco-devops-mongodb-1" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
              - cause:
                  cause:
                    message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-0" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
              - cause:
                  cause:
                    message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-2" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
            message: 6 errors have occurred
          file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2260
          function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).transformToAppPVCs
          linenumber: 2260
          message: Failed to move some PVCs into application namespace
        file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:648
        function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).createPVCsFromPVCSpecs
        linenumber: 648
        message: Failed to move restored PVCs into application namespace
      file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:378
      function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreApp
      linenumber: 378
      message: Failed to create PVCs from PVC specs
    file: kasten.io/k10/kio/exec/internal/runner/phase_runner.go:144
    function: kasten.io/k10/kio/exec/internal/runner.(*phaseRunner).execPlannedPhase
    linenumber: 144
    message: Failure in planned phase
  message: Job failed to be executed
- cause:
    cause:
      cause:
        cause:
          cause:
            errors:
              - cause:
                  cause:
                    message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-2" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
              - cause:
                  cause:
                    message: persistentvolumeclaims "data-volume-goco-devops-mongodb-0" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
              - cause:
                  cause:
                    message: persistentvolumeclaims "data-volume-goco-devops-mongodb-1" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
              - cause:
                  cause:
                    message: persistentvolumeclaims "data-volume-goco-devops-mongodb-2" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
              - cause:
                  cause:
                    message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-0" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
              - cause:
                  cause:
                    message: persistentvolumeclaims "logs-volume-goco-devops-mongodb-1" already
                      exists
                  file: kasten.io/k10/kio/pvccloner/clone.go:92
                  function: kasten.io/k10/kio/pvccloner.kubeCloner.Clone
                  linenumber: 92
                  message: Failed to create cloned PVC in target namespace
                fields:
                  - name: AppNS
                    value: goco-devops-mongodb
                file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2812
                function: kasten.io/k10/kio/exec/phases/phase.transformToAppPVC
                linenumber: 2812
                message: Failed moving PVC from K10 namespace to app namespace
            message: 6 errors have occurred
          file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:2260
          function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).transformToAppPVCs
          linenumber: 2260
          message: Failed to move some PVCs into application namespace
        file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:648
        function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).createPVCsFromPVCSpecs
        linenumber: 648
        message: Failed to move restored PVCs into application namespace
      file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:378
      function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreApp
      linenumber: 378
      message: Failed to create PVCs from PVC specs
    file: kasten.io/k10/kio/exec/internal/runner/phase_runner.go:144
    function: kasten.io/k10/kio/exec/internal/runner.(*phaseRunner).execPlannedPhase
    linenumber: 144
    message: Failure in planned phase
  message: Job failed to be executed


I double checked that all pvcs of that namespace were deleted whilst deleting that namespace but it complains about pvcs already exist.

​​​​​​​


  • Author
  • Comes here often
  • 8 comments
  • September 18, 2024

Tried restoring a different app but same results 

 

          cause:
            message: 'Operation cannot be fulfilled on services "goco-redis-headless": the
              object has been modified; please apply your changes to the latest
              version and try again'
          file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:1876
          function: kasten.io/k10/kio/exec/phases/phase.restoreStatefulSet
          linenumber: 1876
          message: Failed to restore service
        file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:918
        function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreWorkloads
        linenumber: 918
        message: Failed to restore some of the workloads
      file: kasten.io/k10/kio/exec/phases/phase/restore_app.go:398
      function: kasten.io/k10/kio/exec/phases/phase.(*restoreApplicationPhase).restoreApp
      linenumber: 398
      message: Failed to restore workloads
    file: kasten.io/k10/kio/exec/internal/runner/phase_runner.go:144
    function: kasten.io/k10/kio/exec/internal/runner.(*phaseRunner).execPlannedPhase
    linenumber: 144
    message: Failure in planned phase
  message: Job failed to be executed

 


Forum|alt.badge.img+1
  • Experienced User
  • 49 comments
  • September 18, 2024

Hi @msaeed ,

We usually see this error in operator based apps . I would check if your operator is overwriting something during the restore which is conflicting and eventually failing .

or in another case if you have any CI tool which is conflicting .

Remove the associated instance from your operator and then try a restore .

Regards
Satish

 


  • Author
  • Comes here often
  • 8 comments
  • September 19, 2024

Hi Satish 

currently for this job we dont have an operator. looks like  as soon as restore starts, resource or service gets created and as soon as it gets amended by any process, restore fails there. This restore was done after deleting the namespace completely. 

Also I am facing the issue again to not to be able to perform export operations during a backup job for pvcs. 

 

Export job fails after data mover pod gives error as

The node was low on resource: ephemeral-storage. Threshold quantity: 2361707759, available: 2246072Ki. Container container was using 4690720Ki, request is 0, has larger consumption of ephemeral-storage.Container runtime did not kill the pod within specified grace period.


The node where data mover pod gets created significantly changes the /var volume up and down. went upto like 97% and back to 60% and then keeps doing same and then eventually returns that error. 

I can see in the node at /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io, kasten creating and deleting some kopia indexes which are taking the diskspace. 

I have increased the /var upto 100G but still its goes over 90% during the export job for this app with 16 pvcs. In total object data would be around 30G collectively on these pvcs which is not that much that kasten data mover will need 100G of ephemeral-storage or may be thats how it works. however total size of pvcs is around 340G. 

Also I can see in the node kopia processes using like 260% of the cpu
Currently node has 20 cpu set 


Error message in export job in the dashboard:

message: "command terminated with exit code 137.

                            stdout:\ 

                            stderr: / 4 hashing, 376198 hashed (523.9 MB),
                            425989 cached (690.9 MB), uploaded 772.2 MB,
                            estimated 2.1 GB (57.8%) 23m45s left\r

                            - 1 hashing, 377946 hashed (525.5 MB), 425989 cached
                            (690.9 MB), uploaded 775.2 MB, estimated 2.1 GB
                            (57.9%) 23m44s left\r

                            \\ 5 hashing, 380549 hashed (528.2 MB), 425989
                            cached (690.9 MB), uploaded 779.8 MB, estimated 2.1
                            GB (58.1%) 23m41s left\r

                            | 0 hashing, 384697 hashed (533.2 MB), 425989 cached
                            (690.9 MB), uploaded 787.9 MB, estimated 2.1 GB
                            (58.3%) 23m30s left\r

                            / 3 hashing, 388686 hashed (537.9 MB), 425989 cached
                            (690.9 MB), uploaded 795.7 MB, estimated 2.1 GB
                            (58.5%) 23m20s left\r

                            - 4 hashing, 392117 hashed (541.9 MB), 425989 cached
                            (690.9 MB), uploaded 802.3 MB, estimated 2.1 GB
                            (58.7%) 23m13s left\r

                            \\ 0 hashing, 394129 hashed (544.3 MB), 425989
                            cached (690.9 MB), uploaded 806.2 MB, estimated 2.1
                            GB (58.8%) 23m10s left\r

                            | 5 hashing, 398164 hashed (549.1 MB), 425989 cached
                            (690.9 MB), uploaded 814 MB, estimated 2.1 GB
                            (59.0%) 23m0s left\r

                            / 4 hashing, 400912 hashed (552.3 MB), 425989 cached
                            (690.9 MB), uploaded 819.3 MB, estimated 2.1 GB
                            (59.2%) 22m55s left\r

                            - 2 hashing, 404228 hashed (556.7 MB), 425989 cached
                            (690.9 MB), uploaded 826.2 MB, estimated 2.1 GB
                            (59.4%) 22m47s left"
                        file: github.com/kanisterio/kanister@v0.0.0-20240828182737-b6d930f12c93/pkg/kube/exec.go
                        function: github.com/kanisterio/kanister/pkg/kube.ExecWithOptions
                        linenumber: 156
                        message: Failed to exec command in pod
                      file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:379
                      function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func2
                      linenumber: 379
                      message: "Failed to create and upload backup: kanister-tools container ran out
                        of memory"

 


Madi.Cristil
Forum|alt.badge.img+8
  • Community Manager
  • 617 comments
  • September 26, 2024

Forum|alt.badge.img+1
  • Experienced User
  • 49 comments
  • Answer
  • September 27, 2024

 

Following case is being addressed in a support case opened by the customer . We have received the logs and reviewing it .

Regards
Satish


Comment