Solved

ElasticSearch blueprint: context deadline exceeded (timeout)


Userlevel 3
  • Comes here often
  • 14 comments

Hi everybody,

we are evaluating the ElasticSearch blueprint and are getting the following error message:

cause:
cause:
cause:
cause:
cause:
message: context deadline exceeded
file: kasten.io/k10/kio/poll/poll.go:95
function: kasten.io/k10/kio/poll.waitWithBackoffWithRetriesHelper
linenumber: 95
message: Context done while polling
fields:
- name: duration
value: 44m59.987738528s
file: kasten.io/k10/kio/poll/poll.go:65
function: kasten.io/k10/kio/poll.waitWithBackoffWithRetries
linenumber: 65
message: Timeout while polling
fields:
- name: actionSet
value: k10-backup-elasticsearch-blueprint-elasticsearch-master-elgjmdn
file: kasten.io/k10/kio/kanister/operation.go:278
function: kasten.io/k10/kio/kanister.(*Operation).waitForActionSetCompletion
linenumber: 278
message: Error waiting for ActionSet
file: kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:630
function: kasten.io/k10/kio/exec/phases/backup.RunKanisterSnapshotActions
linenumber: 630
message: Failed to create backup kanister phase
message: Job failed to be executed
fields: []

I understand, that the timeout for the backup is exceeded (out test environments Elastic db has 1200GB). How to extend the timeout limit?

 

Thanks everybody!

icon

Best answer by jaiganeshjk 16 February 2022, 12:38

View original

13 comments

Userlevel 6
Badge +2

 @dk-do , As you mentioned, you are hitting the timeout while executing the kanister action.

There is a helm value that can be used to configure this kanister timeout kanister.backupTimeout.

By default this is set to 45mins.

You can upgrade K10 by setting this helm value as below.

helm get values k10 --output yaml --namespace=kasten-io > k10_val.yaml && \
helm upgrade k10 kasten/k10 --namespace=kasten-io -f k10_val.yaml --set kanister.backupTimeout=120

 

Userlevel 3

Thanks for the fast response. I will try it with 120mins.

 

 

Userlevel 6
Badge +2

I just mentioned 120 mins as an example. It might take more than that depending on the network bandwidth or your environment.

Userlevel 3

Yes - I understood it like that. I will check how long it will take and adjust the timeout.

Another question: on which volume (/backup) is the backup file stored?

elasticdump --bulk=true --input=http://${host_name}:9200 --output=/backup
gzip /backup

Is it the storage class defined in the helm values (persistence.storageClass)?

Userlevel 3

@jaiganeshjk: Can you please answer my last question? :)

Another question: on which volume (/backup) is the backup file stored?

elasticdump --bulk=true --input=http://${host_name}:9200 --output=/backup
gzip /backup

Is it the storage class defined in the helm values (persistence.storageClass)?

 

Userlevel 6
Badge +2

@dk-do I just looked at the Elastisearch logical blueprint. We use kubeTask to run the phases in the blueprint.

This kubeTask creates a temporary pod and runs the commands against your ES instance.

We save the dump to a temporary directory and push the dump to your location profile. The local backup file gets deleted along with the temporary pod.

We don’t mount any PVCs in the temporary pod that we create using kubeTask.

 

Hope this answers your question.

Userlevel 3

@dk-do 

This kubeTask creates a temporary pod and runs the commands against your ES instance.

We save the dump to a temporary directory and push the dump to your location profile. The local backup file gets deleted along with the temporary pod.

 

So it is stored in the same PV where Elastic stores its data?

Userlevel 6
Badge +2

 

So it is stored in the same PV where Elastic stores its data?

No it is not stored locally anywhere. It is pushed to the S3 objectstore/NFS filestore that you have configured as a location profile

Userlevel 3

Sorry for asking again:

elasticdump --bulk=true --input=http://${host_name}:9200 --output=/backup
==> this command creates a dump on /backup
==> or do I understand it wrong?
==> where is /backup? I think it is some local storageClass?


kando location push --profile '{{ toJson .Profile }}' --path "${backup_file_path}" --output-name "kopiaOutput" /backup.gz

==> this command pushes the created backup to (in our case) S3

Userlevel 6
Badge +2

elasticdump --bulk=true --input=http://${host_name}:9200 --output=/backup
==> this command creates a dump on /backup

Yea right. We dump it in /backup which is just a temporary directory in the container. It is not persistent.

It gets deleted as soon as the pod gets deleted.

You can get the yaml of the pod that is created for this operation. The name of the pod would be kanister-job-*. It wouldn’t have any volumes attached to it.

kando location push --profile '{{ toJson .Profile }}' --path "${backup_file_path}" --output-name "kopiaOutput" /backup.gz

==> this command pushes the created backup to (in our case) S3

Absolutely. This pushes the dump file from local directory into S3

Userlevel 3

elasticdump --bulk=true --input=http://${host_name}:9200 --output=/backup
==> this command creates a dump on /backup

Yea right. We dump it in /backup which is just a temporary directory in the container. It is not persistent.

It gets deleted as soon as the pod gets deleted.

 

@jaiganeshjk

Okay, thanks for the info! In this case it is stored in the docker overlay directory which is by default the worker node’s root mount (/) on which the container is running:

bash-5.0# ls /backup -lah

-rw-r--r--    1 root     root        1.2G Feb 16 15:59 /backup

 

root@ops2-w6:/var/lib/docker/overlay2/5b120335c0139fdc1a32a3da964ff532cccfecf36d1e9100bea3fd2a8dc5ee40/merged# ls -lah
total 1.5G
drwxr-xr-x 1 root root 4.0K Feb 16 15:09 .
drwx--x--- 5 root root 4.0K Feb 16 15:09 ..
-rw-r--r-- 1 root root 1.5G Feb 16 16:12 backup  <=== Elastic Backup File
drwxr-xr-x 1 root root 4.0K Feb 10 04:51 bin
drwxr-xr-x 1 root root 4.0K Feb 16 15:09 dev
-rwxr-xr-x 1 root root    0 Feb 16 15:09 .dockerenv
-rwxr-xr-x 1 root root  253 Feb 10 04:51 esdump-setup.sh
drwxr-xr-x 1 root root 4.0K Feb 16 15:09 etc
drwxr-xr-x 2 root root 4.0K Jan 16  2020 home
drwxr-xr-x 1 root root 4.0K Jan 16  2020 lib
drwxr-xr-x 5 root root 4.0K Jan 16  2020 media
drwxr-xr-x 2 root root 4.0K Jan 16  2020 mnt
drwxr-xr-x 2 root root 4.0K Jan 16  2020 opt
dr-xr-xr-x 2 root root 4.0K Jan 16  2020 proc
drwx------ 1 root root 4.0K Feb 16 16:00 root
drwxr-xr-x 1 root root 4.0K Feb 16 15:09 run
drwxr-xr-x 2 root root 4.0K Jan 16  2020 sbin
drwxr-xr-x 2 root root 4.0K Jan 16  2020 srv
drwxr-xr-x 2 root root 4.0K Jan 16  2020 sys
drwxrwxrwt 1 root root 4.0K Feb 10 04:51 tmp
drwxr-xr-x 1 root root 4.0K Jan 16  2020 usr
drwxr-xr-x 1 root root 4.0K Jan 16  2020 var
 

 

 

So we have to make sure that / on the corresponding node has enough space - is it possible or “recommended” to store those files on a nfs file share? 

Userlevel 3

Hi @jaiganeshjk 

I have another question: I changed the timeout to 16hrs.
Unfortunately 16hrs are not enough for backing up 1200GB in ElasticSearch. Is there an option to speed it up?

Everything is in local network (we use an own Minio installation for tests), servers are equipped with SSDs and connected with 10GBIT LAN connections.

fields:
- name: duration
value: 15h59m59.986648512s

 

Userlevel 6
Badge +2

@dk-do You are right. Currently we just have it done in the temporary volumes as you mentioned in the above comment.

It is a good feature request to have such volumes in a PVC which will remove dependency of having capacity in the node.

 

For the time taken for the copy, I am not sure if there is a way to speed it up. 

Hi @jaiganeshjk 

I have another question: I changed the timeout to 16hrs.
Unfortunately 16hrs are not enough for backing up 1200GB in ElasticSearch. Is there an option to speed it up?

Everything is in local network (we use an own Minio installation for tests), servers are equipped with SSDs and connected with 10GBIT LAN connections.

fields:
- name: duration
value: 15h59m59.986648512s

 

 

Comment