Solved

Kasten DR restore failed context deadline exceeded


Userlevel 2

Hello Kasten Support !

 

We’ve got an error while trying to perform a Kasten DR restore that prevent Kasten catalog to be restored.

 

Here is the configuration:

  • Openshift 4.10 
  • Kasten v5.0.2 (we have to upgrade it after restore) installed by Helm chart (Openshift Operator cannot be upgraded to v5.5.8 because of Openshift version)
  • NFS Storage through PVC

We followed the documentation and:

  • Install a Fresh K10 instance (with scc.create=true)
  • Created the NFS Location Profile (which is valid and reachable)
  • Created the k10-dr-secret with the correct passphrase
  • Run a k10-restore application with the cluster-id and the location profile

The k10-restore instance failed with the following error:

{"File":"kasten.io/k10/kio/tools/restorectl/servicescaler/utils.go","Function":"kasten.io/k10/kio/tools/restorectl/servicescaler.waitForHealthyService.func1","Line":73,"cluster_name":"XXXXXX-XXXXXX-XXXXXXX-XXXXXX-XXXXXXXX","error":"Get \"http://XXX.XXX.XXX.XXX:8000/v0/healthz\": context deadline exceeded","hostname":"k10-restore-k10restore-72sf4","level":"info","msg":"Waiting for service to be healthy","service":"catalog","time":"2023-04-18T16:16:33.094Z"}
Error: {"message":"Failed to scale up Catalog Deployment","function":"kasten.io/k10/kio/tools/restorectl.restoreK10","linenumber":172,"file":"kasten.io/k10/kio/tools/restorectl/restore.go:172","cause":{"message":"Failed waiting for service to be healthy","function":"kasten.io/k10/kio/tools/restorectl/servicescaler.(*deploymentScaler).ScaleAndVerifyWithTimeout","linenumber":91,"file":"kasten.io/k10/kio/tools/restorectl/servicescaler/deployment_scaler.go:91","fields":[{"name":"service","value":"catalog"}],"cause":{"message":"Timeout while polling","function":"kasten.io/k10/kio/poll.waitWithBackoffWithRetries","linenumber":66,"file":"kasten.io/k10/kio/poll/poll.go:66","fields":[{"name":"duration","value":"4m9.401285582s"}],"cause":{"message":"Context done while polling","function":"kasten.io/k10/kio/poll.waitWithBackoffWithRetriesHelper","linenumber":96,"file":"kasten.io/k10/kio/poll/poll.go:96","cause":{"message":"context deadline exceeded"}}}}}

 

Do you have any advise ?

Best regards,

John

icon

Best answer by FRubens 19 April 2023, 12:29

View original

2 comments

Userlevel 4
Badge +2

Hello @JCandela,

Thank you for using our community and K10.

As per my understanding, you have before installed OpenShift operator, but removed it since the upgrade was not possible to latest version, and installed K10 5.0.2 via helm chart, please let me know if it is correct.

Now you trying to do a DR restore via helm chart, first please make sure you are using the same version of K10 and K10-restore chart, you can use --version=5.0.2 while installing K10-restore. 

Before retrying the DR restore, please remove K10s api CRDs and K10Restore api CRD, and make sure the catalog is scaled up before re-attempting the restore:

K10 CRDS:
k10restores.apik10.kasten.io
k10s.apik10.kasten.io

Command to delete:
kubectl delete crd k10restores.apik10.kasten.io k10s.apik10.kasten.io


Please try again and if any issues, check the logs from K10-Restore pod there you will find more information in what was the issue, also you can provide the log here and we will check.

kubectl logs --namespace kasten-io $(kubectl get pods --selector job-name=k10-restore-k10restore --namespace kasten-io --output jsonpath='{.items[0].metadata.name}') -f

Let us know if you have any further questions.

Fernando R.

Userlevel 2

It worked, thank you very much !

Can you take a look to another case that I’ve submitted ?
Upgrading Kasten above 5.0.10 using Helm is generating the error with the GarbageCollector image
 

 

Thank you !

Comment