Skip to main content

Greetings,

I have over 3000 Retire actions pending for old Kasten DR policy.

All retire actions fail and try to run every 1 hour.

I manually had to cleanup the NFS share because it almost run out of space.

How can I clean up these RetireActions?

I get the following message:

 

- cause:
    cause:
      cause:
        cause:
          cause:
            cause:
              message: context deadline exceeded
            file: kasten.io/k10/kio/poll/poll.go:116
            function: kasten.io/k10/kio/poll.waitWithBackoffWithRetriesHelper
            linenumber: 116
            message: Context done while polling
          fields:
            - name: duration
              value: 15m0.00028077s
          file: kasten.io/k10/kio/poll/poll.go:86
          function: kasten.io/k10/kio/poll.waitWithBackoffWithRetries
          linenumber: 86
          message: Timeout while polling
        file: kasten.io/k10/kio/kopiaapiserver/utils.go:666
        function: kasten.io/k10/kio/kopiaapiserver.waitForPodShutdown
        linenumber: 666
        message: Failed while waiting for Kopia API server pod to shutdown
      file: kasten.io/k10/kio/kanister/snapshots/snapshots.go:596
      function: kasten.io/k10/kio/kanister/snapshots.GenericVolumeSnapshotDelete
      linenumber: 596
      message: Failed to delete Generic Volume Snapshot data
    file: kasten.io/k10/kio/exec/phases/phase/retire_restorepoint.go:476
    function: kasten.io/k10/kio/exec/phases/phase.(*retireRestorePointPhase).retireGenericVolumeSnapshots
    linenumber: 476
    message: Failed to retire some of the generic volume snapshots
  message: Job failed to be executed
- cause:
    cause:
      cause:
        cause:
          cause:
            message: "client rate limiter Wait returned an error: rate: Wait(n=1) would
              exceed context deadline"
          file: kasten.io/k10/kio/kopiaapiserver/utils.go:658
          function: kasten.io/k10/kio/kopiaapiserver.waitForPodShutdown.func1
          linenumber: 658
          message: Failed to list Kopia API server pods
        file: kasten.io/k10/kio/kopiaapiserver/utils.go:666
        function: kasten.io/k10/kio/kopiaapiserver.waitForPodShutdown
        linenumber: 666
        message: Failed while waiting for Kopia API server pod to shutdown
      file: kasten.io/k10/kio/kanister/snapshots/snapshots.go:596
      function: kasten.io/k10/kio/kanister/snapshots.GenericVolumeSnapshotDelete
      linenumber: 596
      message: Failed to delete Generic Volume Snapshot data
    file: kasten.io/k10/kio/exec/phases/phase/retire_restorepoint.go:476
    function: kasten.io/k10/kio/exec/phases/phase.(*retireRestorePointPhase).retireGenericVolumeSnapshots
    linenumber: 476
    message: Failed to retire some of the generic volume snapshots
  message: Job failed to be executed
- cause:
    cause:
      cause:
        cause:
          cause:
            cause:
              message: "client rate limiter Wait returned an error: context deadline exceeded"
            file: kasten.io/k10/kio/kopiaapiserver/utils.go:658
            function: kasten.io/k10/kio/kopiaapiserver.waitForPodShutdown.func1
            linenumber: 658
            message: Failed to list Kopia API server pods
          fields:
            - name: duration
              value: 15m0.000628814s
          file: kasten.io/k10/kio/poll/poll.go:86
          function: kasten.io/k10/kio/poll.waitWithBackoffWithRetries
          linenumber: 86
          message: Timeout while polling
        file: kasten.io/k10/kio/kopiaapiserver/utils.go:666
        function: kasten.io/k10/kio/kopiaapiserver.waitForPodShutdown
        linenumber: 666
        message: Failed while waiting for Kopia API server pod to shutdown
      file: kasten.io/k10/kio/kanister/snapshots/snapshots.go:596
      function: kasten.io/k10/kio/kanister/snapshots.GenericVolumeSnapshotDelete
      linenumber: 596
      message: Failed to delete Generic Volume Snapshot data
    file: kasten.io/k10/kio/exec/phases/phase/retire_restorepoint.go:476
    function: kasten.io/k10/kio/exec/phases/phase.(*retireRestorePointPhase).retireGenericVolumeSnapshots
    linenumber: 476
    message: Failed to retire some of the generic volume snapshots
  message: Job failed to be executed
 

 

Anyone facing thi issue? Is there a workaround? 


Hello @dkats,

 

Thanks for contacting Kasten Community Post. So After viewing what you have above. This looks to be related to possibly the either DataMover or CopyVol pod is getting stuck. K8s will only hold a state of starting on deleting for the pod is considered to be timeout out. I would recommend taking a look over the events of the namespace during the time to see what maybe occurring. 

 

Thanks

Emmanuel


Hello @dkats,

 

Thanks for contacting Kasten Community Post. So After viewing what you have above. This looks to be related to possibly the either DataMover or CopyVol pod is getting stuck. K8s will only hold a state of starting on deleting for the pod is considered to be timeout out. I would recommend taking a look over the events of the namespace during the time to see what maybe occurring. 

 

Thanks

Emmanuel

Hi,

Nothing shows up in the namespace's events.

Before this, the retire action for the snapshots regarding the Kasten DR policy was not working and the NFS share was out of space due to these snapshots.

After the manual cleanup of the NFS share, the DR policy kept working but the snapshots started filling up space again as the retire actions not completing.

Other retire actions for my applications policies run and complete successful in the sane NFS share.

Is there any way to clean Kasten database in order to stop the DR cleanup actions from appearing?

Can I somehow even export my backup policies and restore them in a fresh K10 installation?


Comment