Question

crypto-svc Deployment fails to startup after redeploying myk10


Userlevel 3

I by mistake delete kube-dns and it cause serious issue to my k10 and I was not able to access my dashboard. Now after fixing this I am able to access dashboard but crypto-svc is not working and showing error 

Readiness probe failed: HTTP probe failed with statuscode: 500

Before fixing DNS issue I once delete Kasten deployment as well so I think this is crypto key issue. In logs I see this error 

{"File":"kasten.io/k10/kio/tracing/tracing.go","Function":"kasten.io/k10/kio/tracing.StartProfileBuffers","Line":109,"cluster_name":"3c46c435-ee8a-4053-816c-da5cce7e7259","hostname":"crypto-svc-fb4677845-fs4jd","level":"info","msg":"no profile buffers configured","time":"2023-07-05T07:52:41.282Z","version":"6.0.2"}
{"File":"kasten.io/k10/kio/utils/swagger_utils.go","Function":"kasten.io/k10/kio/utils.ServerLogger","Line":11,"cluster_name":"3c46c435-ee8a-4053-816c-da5cce7e7259","hostname":"crypto-svc-fb4677845-fs4jd","level":"info","msg":"Serving crypto at http://[::]:8000","time":"2023-07-05T07:52:41.322Z","version":"6.0.2"}
{"File":"kasten.io/k10/kio/cryptosvc/masterkey.go","Function":"kasten.io/k10/kio/cryptosvc.(*CryptoSvc).initializeK10MasterKeyAfterCatalogHealthy","Line":45,"cluster_name":"3c46c435-ee8a-4053-816c-da5cce7e7259","error":{"message":"Could not decrypt master key from encryption key artifacts","function":"kasten.io/k10/kio/cryptosvc.decryptEncryptionKeyArtifacts","linenumber":131,"file":"kasten.io/k10/kio/cryptosvc/masterkey.go:131","cause":{"message":"Failed to open master key using the specified passkey","function":"kasten.io/k10/kio/cryptosvc/encryption.(*K10).Decrypt","linenumber":35,"file":"kasten.io/k10/kio/cryptosvc/encryption/k10.go:35"}},"hostname":"crypto-svc-fb4677845-fs4jd","level":"error","msg":"Failed to initialize service","time":"2023-07-05T07:52:41.402Z","version":"6.0.2"}

It seems like issue is with master key for encryption. I am not sure how can I fix it. I try to get all the keys by using this command  kubectl get passkeys.vault.kio.kasten.io but  It return error 

Error from server (InternalError): Internal error occurred: {"message":"Could not list passkeys","function":"kasten.io/k10/kio/rest/clients.ListPasskey","linenumber":58,"file":"kasten.io/k10/kio/rest/clients/cryptoclient.go:58","cause":{"message":"Get \"http://10.109.114.187:8000/v0/passkey\": dial tcp 10.109.114.187:8000: i/o timeout"}}

Any suggestion or advice will be very helpful. Many Thanks for your time and effort.

Best Regards,

Tauqer.A


11 comments

Userlevel 6
Badge +2

@tauqeerahmad Thank you for posting your question here.

It seems that K10 is unable to decrypt the masterKey from its catalog.

K10 stores an encrypted masterKey in its catalog and it is decrypted using a passphrase in a secret called `k10-cluster-passphrase` in Kasten-io namespace.(there are other methods which uses AWS KMS/Vault). Without this secret, K10 would not be able to open/decrypt its masterKey.

 

Do you see the secret in k10 namespace ? If not, Would you be able to confirm if this secret got deleted for some reason?

Userlevel 6
Badge +2

If the secret was deleted for some reason, You could recover from this situation by following Recovering K10 From a Disaster (provided you had K10 DR backups enabled and ran successfully before getting into this situation). For this reason, It is always recommended to enabled and run K10 DR policy so that you have a successful backup of K10 in such disasters.

Userlevel 6
Badge +2

If you don't have K10 DR policy enabled, you will not have any other way to recover k10's catalog and secrets that k10 use to decrypt the masterKey and generate the encryptionKeys.

 

You will have to re-install k10 by completely removing it and also cleanup your external location target as K10 will not have a way to access those repositories due to the unavailability of encryption keys.
 

Userlevel 3

Hi Jaiganeshjk,

Thanks for your detailed answer. Yes the secret was delete and I follow the doc to recover it. Now when I run this 

helm install k10-restore kasten/k10restore --namespace=kasten-io --set sourceClusterID=kubernetes-admin@kubernetes --set profile.name=nfs-storage

Job for restore run and then fail with this log. 

Usage:
restorectl restore [flags]
Flags:
-h, --help help for restore
Global Flags:
-c, --clusterid string Specify the cluster ID captured during backup (required)
-n, --namespace string Specify the namespace where K10 is currently deployed (required)
-t, --point-in-time string Specify an optional point in time (RFC3339) at which to evaluate restore data
-p, --profile string Specify the profile that was used during backup (required)
-s, --skipResource string Specify if restore of policies,profiles,secrets needs to be skipped.
{"File":"kasten.io/k10/kio/tools/restorectl/root.go","Function":"kasten.io/k10/kio/tools/restorectl.Execute","Line":24,"cluster_name":"3c46c435-ee8a-4053-816c-da5cce7e7259","error":{"message":"Failed to get profile. A location profile must be created for K10 Disaster Recovery","function":"kasten.io/k10/kio/tools/restorectl.restoreK10","linenumber":83,"file":"kasten.io/k10/kio/tools/restorectl/restore.go:83","cause":{"message":"No profile detected. A location profile must be created for K10 Disaster Recovery","function":"kasten.io/k10/kio/tools/restorectl.getK10AndKanisterProfileCR","linenumber":51,"file":"kasten.io/k10/kio/tools/restorectl/utils.go:51","cause":{"message":"profiles.config.kio.kasten.io \"nfs-storage\" not found"}}},"hostname":"k10-restore-k10restore-tfr7f","level":"error","msg":"Failed","time":"2023-07-05T09:29:54.824Z"}

Can You please tell what exactly Cluster ID is? Is this my k8s cluster name or something related to K10 cluster? I found one Cluster ID from Kasten which is new one and job fail again.  

Best Regards,

Tauqeer.A

Userlevel 3

Also I am seeing this alert on k10 dashboard

 

 

Userlevel 6
Badge +2

ClusterId is nothing but the UID of default namespace in your cluster.

Reference:https://docs.kasten.io/latest/operating/dr.html#cluster-id

 

From the above error message, the clusterID is wrong in your command and also the error message states that there is no profile with the name nfs-storage.

Also this will work if you already have the DR backups running before the secrets were deleted.

I don’t see the restorepoints or k10-disaster-recovery-policy created from your logs that you have shared in the case. 

Are you sure that you had DR backup in your location profile ?

Userlevel 3

I am not sure about it. However I had some backups and I try to restore them manually. After creating the policy when I run it I see this error  

Connection Error

Request failed with status code 500 (dashboardbff-svc)

 

However my dashboard service and deployment running fine with no error. 

Userlevel 6
Badge +2

Yes that is expected because k10 doesn’t have the encryption key to decrypt the data from the backups and do the restore. 

You will have to consider cleaning up the location profile target and then re-installing K10 to backup/export your application normally.

 

If you have lost some application and need to recover them, you could use the migration tokens from the earlier exports. You could try to save them safely, Uninstall k10 and run an import policy using those migrationtokens. This will have imported restorepoints from your older backup. You can use them to restore the application. 

However, newer exports to the same NFS or atleast the same Path won’t work. You will have to cleanup eventually.

 

If you don’t cleanup with the reinstall, you will end up in a situation similar to what it is mentioned in this KB article https://kb.kasten.io/knowledge/exports-dont-work-after-k10-reinstall

Userlevel 3

Now I have 5 applications backup located in NFS storage. As you mentioned above I need to clean the location path and maybe create a new path for NFS and create new NFS storage class, PV and PVC freshly re-installed k10. After the installation and creating a profile with new NFS path if I manually copy these 5 backups to the new location will I be able to restore them by using token? 

Or

These backups have an old k10 UID and I will face conflict? Did I get it right?

In that case how can I migrate application to different cluster in different network without any external network access? Or it is not possible with Kasten?

I am just testing Kasten before purchase to make sure to understand its all capabilities. I really appreciate your time on this if you please explain? 

Regards,

Tauqeer.A

Userlevel 6
Badge +2

Sure @tauqeerahmad I will be glad to assist you on this.

There are multiple things you can do at this point.

  1. If you are planning to just test the restore into a new cluster, then copy the migration token/receiveString from the policy/corresponding secret and use it with import policies on a second cluster. This will import the restorepoints to new cluster which you can then use to restore and test.
  2. If you want to test restore of the old backups in the same cluster, you cannot do it in the current state. You don’t have the cluster passphrase and no way to retrieve it as well. You will need to uninstall current k10 installation and reinstall a fresh k10 installation. Once you have new K10 installation, you can import old application restorepoints if you want to test restore. Or if you want to use new K10 to backup/export your applications, then you need to cleanup your NFS directory where the old backups are stored. Your K10 doesn’t have the encryption key to decrypt it by default without the receiveString.

If you are planning to use the import, Don’t manually delete/move anything in the NFS profile. 

If you are still considering to use new k10 to do new exports, you can specify a different path while creating the location profile. 

 

In any case, I would suggest you to get in touch with a Veeam/Kasten Sales/SE team to help you with your use cases and PoC.

Userlevel 6
Badge +2

PS: K10 doesn't support importing backups to the same k10 instance where it was exported from.

Meaning you cannot run an import policy in the same k10 instance that you have currently. There will be conflict in artifacts. You will have to reinstall k10 or import into another cluster for this to work properly.

Comment