Skip to main content
Question

Backup of large Postgres Pod failing due to "Timeout while polling"


Hello,

I’m trying to backup a large Postgres Pod (>150 GB) but it fails after 45 minutes with the following error message:

- cause:
    cause:
      cause:
        cause:
          cause:
            cause:
              message: context deadline exceeded
            file: kasten.io/k10/kio/poll/poll.go:96
            function: kasten.io/k10/kio/poll.waitWithBackoffWithRetriesHelper
            linenumber: 96
            message: Context done while polling
          fields:
            - name: duration
              value: 44m59.992268957s
          file: kasten.io/k10/kio/poll/poll.go:66
          function: kasten.io/k10/kio/poll.waitWithBackoffWithRetries
          linenumber: 66
          message: Timeout while polling
        fields:
          - name: actionSet
            value: k10-backup-k10-pgo-bp-0.0.3-cinepg-cine--295f2
        file: kasten.io/k10/kio/kanister/operation.go:381
        function: kasten.io/k10/kio/kanister.(*Operation).waitForActionSetCompletion
        linenumber: 381
        message: Error waiting for ActionSet
      file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:581
      function: kasten.io/k10/kio/exec/phases/backup.snapshotNamespace
      linenumber: 581
      message: Error performing operator snapshot
    file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:407
    function: kasten.io/k10/kio/exec/phases/backup.processNonWorkloadArtifact
    linenumber: 407
    message: Failed snapshot for namespace
  message: Job failed to be executed

 

Are there any parameters I could set in the backup policy to extend this timeout, or what is the actual reason for this issue?

 

BR,

Daniel

1 comment

FRubens
Forum|alt.badge.img+2
  • Experienced User
  • 96 comments
  • May 31, 2023

Hello @Daniel Moes,

Thank you for using K10 community!

There is a parameter that can be used to increase the kanister backup timeout, that would be for your case since you probably is using a blueprint for backup Postgres.

I would recommend first try to identify if there are no issues during the backup phase in the kanister-svc logs, if you see that the backup was still running and did not complete in time, you can try increasing the value.

Would be good to try the same blueprint command in the Postgres pod and see how much time it takes so you can have an idea.

If you find out it is just taking more time, the helm value below can be used to upgrade kanister backup timeout:

--set kanister.backupTimeout=<value-minutes>

The default value is 45 minutes, it can be increased according to your needs, below as an example, increasing to 120 minutes:

helm get values k10 -n kasten-io > k10_val.yaml
helm upgrade k10 kasten/k10 --namespace=kasten-io -f k10_val.yaml \
--set kanister.backupTimeout=120 --version=YOUR_K10_VERSION

More information about this and other K10 helm values can be found at our docs:

https://docs.kasten.io/latest/install/advanced.html#complete-list-of-k10-helm-options

Hope it helps.

FRubens


Comment