Question

Kasten K10 DR Backup Context Deadline Exceeded


Hi,

 

I am trying to take backup of K10 DR and it is failing after 45m everytime. I have tried to increase the KastenBackupTimeout to 150m but still it fails at 45m. This is an openshift environment on-premise.

 

Here is the detail of the error.

 

```

- cause:
    cause:
      cause:
        cause:
          message: context deadline exceeded
        file: kasten.io/k10/kio/poll/poll.go:96
        function: kasten.io/k10/kio/poll.waitWithBackoffWithRetriesHelper
        linenumber: 96
        message: Context done while polling
      fields:
        - name: duration
          value: 44m59.970739698s
      file: kasten.io/k10/kio/poll/poll.go:66
      function: kasten.io/k10/kio/poll.waitWithBackoffWithRetries
      linenumber: 66
      message: Timeout while polling
    fields:
      - name: actionSet
        value: k10-backupcatalogtoserver-k10-dr-bp-2.1.2-xv2lp-catalog-svwdcsq
    file: kasten.io/k10/kio/kanister/operation.go:381
    function: kasten.io/k10/kio/kanister.(*Operation).waitForActionSetCompletion
    linenumber: 381
    message: Error waiting for ActionSet
  message: Job failed to be executed

```

 

 kubectl get cm k10-config -o yaml | egrep "version|Timeout|45"
  KanisterBackupTimeout: "150"
  KanisterCheckRepoTimeout: "20"
  KanisterDeleteTimeout: "45"
  KanisterEFSPostRestoreTimeout: "45"
  KanisterHookTimeout: "20"
  KanisterPodReadyWaitTimeout: "15"
  KanisterRestoreTimeout: "600"
  KanisterStatsTimeout: "20"
  concurrentSnapConversions: "3"
  kubeVirtVMsUnFreezeTimeout: 5m
  version: 5.5.10
  vmWareTaskTimeoutMin: "60"
  resourceVersion: "353357456"
 

 

Could you please help in resolving this issue? We want to upgrade the cluster and this is pre-requisite to have a safe DR setup. 

 

Thanks

Anjul


4 comments

Userlevel 6
Badge +2

@anjul Thanks for posting the question.

can you confirm if you updated the value using helm value `kanister.backupTimeout` or just edited the value of KanisterBackupTimeout in k10-config configMap. ?
 

If you have manually updated the configmap, You will have to restart the executor-svc and kanister-svc pods to make the changes active(As these values are set with environment variables).

You can run the below commands to rollout restart the deployments

kubectl rollout restart deploy executor-svc -n kasten-io

kubectl rollout restart deploy kanister-svc -n kasten-io

 

@jaiganeshjk nice to see you again. I actually upgraded through helm and ensured pod were restarted to get new config. However, it happened again with same error. Is there any other thing I can do to get more detailed log?

Userlevel 6
Badge +2

@anjul Good to talk to you to.

Thanks for confirming that you used helm to do the upgrade of the values.

If it fails with the same error, then for some reason this update in the config is not taken into account.

 

The commands are executed from the executor to the kanister-sidecar container during the K10 DR backup. 

Can you search for any other error messages in the executor pod logs ? 

I downgraded to supported version to fix the issue. 

DR is at least working fine now. 

There is another issue around taking a backup of one particular namespace, which keeps failing. Out of 80, only a couple of them are not working. 

Comment