Solved

ElasticSearch blueprint: context deadline exceeded (timeout)

2 years ago
16 February 2022
13 comments
1004 views

Userlevel 3

dk-do
Comes here often
14 comments

Hi everybody,

we are evaluating the ElasticSearch blueprint and are getting the following error message:

cause:
  cause:
    cause:
      cause:
        cause:
          message: context deadline exceeded
        file: kasten.io/k10/kio/poll/poll.go:95
        function: kasten.io/k10/kio/poll.waitWithBackoffWithRetriesHelper
        linenumber: 95
        message: Context done while polling
      fields:
        - name: duration
          value: 44m59.987738528s
      file: kasten.io/k10/kio/poll/poll.go:65
      function: kasten.io/k10/kio/poll.waitWithBackoffWithRetries
      linenumber: 65
      message: Timeout while polling
    fields:
      - name: actionSet
        value: k10-backup-elasticsearch-blueprint-elasticsearch-master-elgjmdn
    file: kasten.io/k10/kio/kanister/operation.go:278
    function: kasten.io/k10/kio/kanister.(*Operation).waitForActionSetCompletion
    linenumber: 278
    message: Error waiting for ActionSet
  file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:630
  function: kasten.io/k10/kio/exec/phases/backup.RunKanisterSnapshotActions
  linenumber: 630
  message: Failed to create backup kanister phase
message: Job failed to be executed
fields: []

I understand, that the timeout for the backup is exceeded (out test environments Elastic db has 1200GB). How to extend the timeout limit?

Thanks everybody!

icon

Best answer by jaiganeshjk 16 February 2022, 12:38

View original

13 comments

Userlevel 6

jaiganeshjk
Experienced User
239 comments
2 years ago
16 February 2022
Answer

@dk-do , As you mentioned, you are hitting the timeout while executing the kanister action.

There is a helm value that can be used to configure this kanister timeout kanister.backupTimeout.

By default this is set to 45mins.

You can upgrade K10 by setting this helm value as below.

helm get values k10 --output yaml --namespace=kasten-io > k10_val.yaml && \
    helm upgrade k10 kasten/k10 --namespace=kasten-io -f k10_val.yaml --set kanister.backupTimeout=120

Userlevel 3

dk-do
Author
Comes here often
14 comments
2 years ago
16 February 2022

Thanks for the fast response. I will try it with 120mins.

Userlevel 6

jaiganeshjk
Experienced User
239 comments
2 years ago
16 February 2022

I just mentioned 120 mins as an example. It might take more than that depending on the network bandwidth or your environment.

Userlevel 3

dk-do
Author
Comes here often
14 comments
2 years ago
16 February 2022

Yes - I understood it like that. I will check how long it will take and adjust the timeout.

Another question: on which volume (/backup) is the backup file stored?

elasticdump --bulk=true --input=http://${host_name}:9200 --output=/backup
gzip /backup

Is it the storage class defined in the helm values (persistence.storageClass)?

Userlevel 3

dk-do
Author
Comes here often
14 comments
2 years ago
16 February 2022

@jaiganeshjk: Can you please answer my last question? :)

Another question: on which volume (/backup) is the backup file stored?

elasticdump --bulk=true --input=http://${host_name}:9200 --output=/backup
gzip /backup

Is it the storage class defined in the helm values (persistence.storageClass)?

Userlevel 6

jaiganeshjk
Experienced User
239 comments
2 years ago
16 February 2022

@dk-do I just looked at the Elastisearch logical blueprint. We use kubeTask to run the phases in the blueprint.

This kubeTask creates a temporary pod and runs the commands against your ES instance.

We save the dump to a temporary directory and push the dump to your location profile. The local backup file gets deleted along with the temporary pod.

We don’t mount any PVCs in the temporary pod that we create using kubeTask.

Hope this answers your question.

Userlevel 3

dk-do
Author
Comes here often
14 comments
2 years ago
16 February 2022

@dk-do

This kubeTask creates a temporary pod and runs the commands against your ES instance.

We save the dump to a temporary directory and push the dump to your location profile. The local backup file gets deleted along with the temporary pod.

So it is stored in the same PV where Elastic stores its data?

Userlevel 6

jaiganeshjk
Experienced User
239 comments
2 years ago
16 February 2022

So it is stored in the same PV where Elastic stores its data?

No it is not stored locally anywhere. It is pushed to the S3 objectstore/NFS filestore that you have configured as a location profile

Userlevel 3

dk-do
Author
Comes here often
14 comments
2 years ago
16 February 2022

Sorry for asking again:

elasticdump --bulk=true --input=http://${host_name}:9200 --output=/backup
==> this command creates a dump on /backup
==> or do I understand it wrong?
==> where is /backup? I think it is some local storageClass?

kando location push --profile '{{ toJson .Profile }}' --path "${backup_file_path}" --output-name "kopiaOutput" /backup.gz

==> this command pushes the created backup to (in our case) S3

Userlevel 6

jaiganeshjk
Experienced User
239 comments
2 years ago
16 February 2022

elasticdump --bulk=true --input=http://${host_name}:9200 --output=/backup
==> this command creates a dump on /backup

Yea right. We dump it in /backup which is just a temporary directory in the container. It is not persistent.

It gets deleted as soon as the pod gets deleted.

You can get the yaml of the pod that is created for this operation. The name of the pod would be kanister-job-*. It wouldn’t have any volumes attached to it.

kando location push --profile '{{ toJson .Profile }}' --path "${backup_file_path}" --output-name "kopiaOutput" /backup.gz
==> this command pushes the created backup to (in our case) S3

Absolutely. This pushes the dump file from local directory into S3

Userlevel 3

dk-do
Author
Comes here often
14 comments
2 years ago
16 February 2022

elasticdump --bulk=true --input=http://${host_name}:9200 --output=/backup
==> this command creates a dump on /backup

Yea right. We dump it in /backup which is just a temporary directory in the container. It is not persistent.

It gets deleted as soon as the pod gets deleted.

@jaiganeshjk

Okay, thanks for the info! In this case it is stored in the docker overlay directory which is by default the worker node’s root mount (/) on which the container is running:

bash-5.0# ls /backup -lah

-rw-r--r-- 1 root root 1.2G Feb 16 15:59 /backup

root@ops2-w6:/var/lib/docker/overlay2/5b120335c0139fdc1a32a3da964ff532cccfecf36d1e9100bea3fd2a8dc5ee40/merged# ls -lah
total 1.5G
drwxr-xr-x 1 root root 4.0K Feb 16 15:09 .
drwx--x--- 5 root root 4.0K Feb 16 15:09 ..
-rw-r--r-- 1 root root 1.5G Feb 16 16:12 backup <=== Elastic Backup File
drwxr-xr-x 1 root root 4.0K Feb 10 04:51 bin
drwxr-xr-x 1 root root 4.0K Feb 16 15:09 dev
-rwxr-xr-x 1 root root 0 Feb 16 15:09 .dockerenv
-rwxr-xr-x 1 root root 253 Feb 10 04:51 esdump-setup.sh
drwxr-xr-x 1 root root 4.0K Feb 16 15:09 etc
drwxr-xr-x 2 root root 4.0K Jan 16 2020 home
drwxr-xr-x 1 root root 4.0K Jan 16 2020 lib
drwxr-xr-x 5 root root 4.0K Jan 16 2020 media
drwxr-xr-x 2 root root 4.0K Jan 16 2020 mnt
drwxr-xr-x 2 root root 4.0K Jan 16 2020 opt
dr-xr-xr-x 2 root root 4.0K Jan 16 2020 proc
drwx------ 1 root root 4.0K Feb 16 16:00 root
drwxr-xr-x 1 root root 4.0K Feb 16 15:09 run
drwxr-xr-x 2 root root 4.0K Jan 16 2020 sbin
drwxr-xr-x 2 root root 4.0K Jan 16 2020 srv
drwxr-xr-x 2 root root 4.0K Jan 16 2020 sys
drwxrwxrwt 1 root root 4.0K Feb 10 04:51 tmp
drwxr-xr-x 1 root root 4.0K Jan 16 2020 usr
drwxr-xr-x 1 root root 4.0K Jan 16 2020 var

So we have to make sure that / on the corresponding node has enough space - is it possible or “recommended” to store those files on a nfs file share?

Userlevel 3

dk-do
Author
Comes here often
14 comments
2 years ago
17 February 2022

Hi @jaiganeshjk

I have another question: I changed the timeout to 16hrs.
Unfortunately 16hrs are not enough for backing up 1200GB in ElasticSearch. Is there an option to speed it up?

Everything is in local network (we use an own Minio installation for tests), servers are equipped with SSDs and connected with 10GBIT LAN connections.

fields:
  - name: duration
    value: 15h59m59.986648512s

Userlevel 6

jaiganeshjk
Experienced User
239 comments
2 years ago
17 February 2022

@dk-do You are right. Currently we just have it done in the temporary volumes as you mentioned in the above comment.

It is a good feature request to have such volumes in a PVC which will remove dependency of having capacity in the node.

For the time taken for the copy, I am not sure if there is a way to speed it up.

Hi @jaiganeshjk

I have another question: I changed the timeout to 16hrs.
Unfortunately 16hrs are not enough for backing up 1200GB in ElasticSearch. Is there an option to speed it up?

Everything is in local network (we use an own Minio installation for tests), servers are equipped with SSDs and connected with 10GBIT LAN connections.

fields:
  - name: duration
    value: 15h59m59.986648512s

Comment

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded