Question

Retire actions failed for Kasten Disaster Recovery (DR)

10 months ago
May 22, 2024
3 comments
77 views

D

dkats
Not a newbie anymore
5 comments

Greetings,

I have over 3000 Retire actions pending for old Kasten DR policy.

All retire actions fail and try to run every 1 hour.

I manually had to cleanup the NFS share because it almost run out of space.

How can I clean up these RetireActions?

I get the following message:

- cause:
cause:
cause:
cause:
cause:
cause:
message: context deadline exceeded
file: kasten.io/k10/kio/poll/poll.go:116
function: kasten.io/k10/kio/poll.waitWithBackoffWithRetriesHelper
linenumber: 116
message: Context done while polling
fields:
- name: duration
value: 15m0.00028077s
file: kasten.io/k10/kio/poll/poll.go:86
function: kasten.io/k10/kio/poll.waitWithBackoffWithRetries
linenumber: 86
message: Timeout while polling
file: kasten.io/k10/kio/kopiaapiserver/utils.go:666
function: kasten.io/k10/kio/kopiaapiserver.waitForPodShutdown
linenumber: 666
message: Failed while waiting for Kopia API server pod to shutdown
file: kasten.io/k10/kio/kanister/snapshots/snapshots.go:596
function: kasten.io/k10/kio/kanister/snapshots.GenericVolumeSnapshotDelete
linenumber: 596
message: Failed to delete Generic Volume Snapshot data
file: kasten.io/k10/kio/exec/phases/phase/retire_restorepoint.go:476
function: kasten.io/k10/kio/exec/phases/phase.(*retireRestorePointPhase).retireGenericVolumeSnapshots
linenumber: 476
message: Failed to retire some of the generic volume snapshots
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
message: "client rate limiter Wait returned an error: rate: Wait(n=1) would
exceed context deadline"
file: kasten.io/k10/kio/kopiaapiserver/utils.go:658
function: kasten.io/k10/kio/kopiaapiserver.waitForPodShutdown.func1
linenumber: 658
message: Failed to list Kopia API server pods
file: kasten.io/k10/kio/kopiaapiserver/utils.go:666
function: kasten.io/k10/kio/kopiaapiserver.waitForPodShutdown
linenumber: 666
message: Failed while waiting for Kopia API server pod to shutdown
file: kasten.io/k10/kio/kanister/snapshots/snapshots.go:596
function: kasten.io/k10/kio/kanister/snapshots.GenericVolumeSnapshotDelete
linenumber: 596
message: Failed to delete Generic Volume Snapshot data
file: kasten.io/k10/kio/exec/phases/phase/retire_restorepoint.go:476
function: kasten.io/k10/kio/exec/phases/phase.(*retireRestorePointPhase).retireGenericVolumeSnapshots
linenumber: 476
message: Failed to retire some of the generic volume snapshots
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
cause:
message: "client rate limiter Wait returned an error: context deadline exceeded"
file: kasten.io/k10/kio/kopiaapiserver/utils.go:658
function: kasten.io/k10/kio/kopiaapiserver.waitForPodShutdown.func1
linenumber: 658
message: Failed to list Kopia API server pods
fields:
- name: duration
value: 15m0.000628814s
file: kasten.io/k10/kio/poll/poll.go:86
function: kasten.io/k10/kio/poll.waitWithBackoffWithRetries
linenumber: 86
message: Timeout while polling
file: kasten.io/k10/kio/kopiaapiserver/utils.go:666
function: kasten.io/k10/kio/kopiaapiserver.waitForPodShutdown
linenumber: 666
message: Failed while waiting for Kopia API server pod to shutdown
file: kasten.io/k10/kio/kanister/snapshots/snapshots.go:596
function: kasten.io/k10/kio/kanister/snapshots.GenericVolumeSnapshotDelete
linenumber: 596
message: Failed to delete Generic Volume Snapshot data
file: kasten.io/k10/kio/exec/phases/phase/retire_restorepoint.go:476
function: kasten.io/k10/kio/exec/phases/phase.(*retireRestorePointPhase).retireGenericVolumeSnapshots
linenumber: 476
message: Failed to retire some of the generic volume snapshots
message: Job failed to be executed

D

dkats
Author
Not a newbie anymore
5 comments
7 months ago
August 2, 2024

Anyone facing thi issue? Is there a workaround?

E

+1

EBrockman
Comes here often
89 comments
7 months ago
August 26, 2024

Hello @dkats,

Thanks for contacting Kasten Community Post. So After viewing what you have above. This looks to be related to possibly the either DataMover or CopyVol pod is getting stuck. K8s will only hold a state of starting on deleting for the pod is considered to be timeout out. I would recommend taking a look over the events of the namespace during the time to see what maybe occurring.

Thanks

Emmanuel

D

dkats
Author
Not a newbie anymore
5 comments
7 months ago
August 26, 2024

EBrockman wrote:

Hello @dkats,

Thanks for contacting Kasten Community Post. So After viewing what you have above. This looks to be related to possibly the either DataMover or CopyVol pod is getting stuck. K8s will only hold a state of starting on deleting for the pod is considered to be timeout out. I would recommend taking a look over the events of the namespace during the time to see what maybe occurring.

Thanks

Emmanuel

Hi,

Nothing shows up in the namespace's events.

Before this, the retire action for the snapshots regarding the Kasten DR policy was not working and the NFS share was out of space due to these snapshots.

After the manual cleanup of the NFS share, the DR policy kept working but the snapshots started filling up space again as the retire actions not completing.

Other retire actions for my applications policies run and complete successful in the sane NFS share.

Is there any way to clean Kasten database in order to stop the DR cleanup actions from appearing?

Can I somehow even export my backup policies and restore them in a fresh K10 installation?

Comment

Related topics

Era 300 als Stereopaar mit Mac verbinden?icon

Audio von deinem Plattenspieler auf Sonos hören

ERA300 Stereopaar an PC anschließenicon

Era300 mit Move verbindenicon

Era 300 Stereopaar - Balanceicon

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded