Backup of large Postgres Pod failing due to "Timeout while polling"

Question

Hello,I’m trying to backup a large Postgres Pod (>150 GB) but it fails after 45 minutes with the following error message:- cause:    cause:      cause:        cause:          cause:            cause:              message: context deadline exceeded            file: kasten.io/k10/kio/poll/poll.go:96            function: kasten.io/k10/kio/poll.waitWithBackoffWithRetriesHelper            linenumber: 96            message: Context done while polling          fields:            - name: duration              value: 44m59.992268957s          file: kasten.io/k10/kio/poll/poll.go:66          function: kasten.io/k10/kio/poll.waitWithBackoffWithRetries          linenumber: 66          message: Timeout while polling        fields:          - name: actionSet            value: k10-backup-k10-pgo-bp-0.0.3-cinepg-cine--295f2        file: kasten.io/k10/kio/kanister/operation.go:381        function: kasten.io/k10/kio/kanister.(*Operation).waitForActionSetCompletion        linenumber: 381        message: Error waiting for ActionSet      file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:581      function: kasten.io/k10/kio/exec/phases/backup.snapshotNamespace      linenumber: 581      message: Error performing operator snapshot    file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:407    function: kasten.io/k10/kio/exec/phases/backup.processNonWorkloadArtifact    linenumber: 407    message: Failed snapshot for namespace  message: Job failed to be executed Are there any parameters I could set in the backup policy to extend this timeout, or what is the actual reason for this issue? BR,Daniel

FRubens · Answer

Hello@Daniel Moes,Thank you for using K10 community!There is a parameter that can be used to increase the kanister backup timeout, that would be for your case since you probably is using a blueprint for backup Postgres.I would recommend first try to identify if there are no issues during the backup phase in the kanister-svc logs, if you see that the backup was still running and did not complete in time, you can try increasing the value.Would be good to try the same blueprint command in the Postgres pod and see how much time it takes so you can have an idea.If you find out it is just taking more time, the helm value below can be used to upgrade kanister backup timeout:--set kanister.backupTimeout=<value-minutes>The default value is 45 minutes, it can be increased according to your needs,below as an example,increasing to120 minutes:helm get values k10 -n kasten-io > k10_val.yamlhelm upgrade k10 kasten/k10 --namespace=kasten-io -f k10_val.yaml \--set kanister.backupTimeout=120 --version=YOUR_K10_VERSIONMore information about this and other K10 helm values can be found at our docs:https://docs.kasten.io/latest/install/advanced.html#complete-list-of-k10-helm-optionsHope it helps.FRubens

Comment

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded