Skip to main content

We are running a Kasten K10 v5.5.0, using MinIO as S3 storage.  Kasten DR Policy has been enabled.

Due to our backbone GitOps solution, Kasten has been uninstalled and then reinstalled from scratch on the same cluster.  Cluster PassPhrase and DR PassPhrase are remained the same because are themselves managed through GitOps.

What we are experiencing now is a pretty annoying / strange behavior.

 

All applications that were backup before the reinstall have their policies that now fail, all with the same error:

          message: "Failed to exec command in pod: command terminated with exit code 1"
file: kasten.io/k10/kio/kopia/repository.go:471
function: kasten.io/k10/kio/kopia.CreateKopiaRepository
linenumber: 471
message: Failed to create the backup repository
file: kasten.io/k10/kio/kopiaapiserver/api_server.go:204
function: kasten.io/k10/kio/kopiaapiserver.SetupAPIServerForCollectionExport
linenumber: 204
message: Failed to initialize Kopia API server
file: kasten.io/k10/kio/collections/kopia/connector.go:85
function: kasten.io/k10/kio/collections/kopia.(*KopiaConnector).ConnectForExport
linenumber: 85
message: Failed to prepare Kopia API server for collections export
file: kasten.io/k10/kio/exec/phases/phase/migrate.go:126
function: kasten.io/k10/kio/exec/phases/phase.(*migrateSendPhase).Run
linenumber: 126
message: Failed to export collection
message: Job failed to be executed

Instead, applications that were not backup, can now run their policies without problem.

 

Before and after Kasten is pointing to the same MinIO Bucket.

We have tried to use Kasten Restore to recover the situation, the restore policy went OK but the problem is still the same.

 

The only official documentation available is this link: https://kb.kasten.io/knowledge/exports-dont-work-after-k10-reinstall,  it represents exactly our situation but the solution proposed is of course no sense for a production system.

It’s several months that we are struggling on this without finding a way to overcome it, unfortunately the GitOps framework does force uninstall + install when a new version of the linked GitRepo is tagged and this seems to be not so much “friendly” with Kasten.

 

Hello @Matteo.Gazzadi 

 

So, it looks to be that you reinstalled K10. This will cause problems in general as the Master Key (key used to access the repos in the location profile) stored in Catalog was deleted, the only actual way to resolve this issue would be to restore this key. To do this, you would need to restore your K10 via DR Restore. Please follow this link https://docs.kasten.io/latest/operating/dr.html?highlight=dr#k10-disaster-recovery. If non DR was setup then at this time we do not support any other way to recover this key. You would then need to delete the data in the Location Profile to make the location profile usable once more. 

 

Thanks

Emmanuel


Hi @EBrockman.

That’s the actual problem. As stated in the first post we have DR enabled, and when the reinstallation occured we have run the DR recovery procedure https://docs.kasten.io/latest/operating/dr.html?highlight=dr#restore-k10-backup

The k10-restore job completed successfully as you can see in the screen below.

 

However, if now try to run the backup policy of an application for whom a backup was already done before the reinstall, that policy does fail with:

      - cause: '{"cause":{"cause":{"cause":{"cause":{"message":"Failed to exec command
in pod: command terminated with exit code
1"},"file":"kasten.io/k10/kio/kopia/repository.go:471","function":"kasten.io/k10/kio/kopia.CreateKopiaRepository","linenumber":471,"message":"Failed
to create the backup
repository"},"file":"kasten.io/k10/kio/kopiaapiserver/api_server.go:204","function":"kasten.io/k10/kio/kopiaapiserver.SetupAPIServerForCollectionExport","linenumber":204,"message":"Failed
to initialize Kopia API
server"},"file":"kasten.io/k10/kio/collections/kopia/connector.go:85","function":"kasten.io/k10/kio/collections/kopia.(*KopiaConnector).ConnectForExport","linenumber":85,"message":"Failed
to prepare Kopia API server for collections
export"},"file":"kasten.io/k10/kio/exec/phases/phase/migrate.go:126","function":"kasten.io/k10/kio/exec/phases/phase.(*migrateSendPhase).Run","linenumber":126,"message":"Failed
to export collection"}'

 

Some fancy stuff that we have seen, don’t know if actually a bug or expected:

 

  1. When k10-disaster-recovery-policy runs, the artifacts list contains all migration-token for every application, taken from the kasten-io namespace.
  2. When k10-restore runs, however these migration-token are not restored, and the new one created by applications are kept (also the UID of the policies for these app is now different)

Still don’t get if we miss something or actually it’s not expected a complete infrastructure failure.

 

 


@Matteo.Gazzadi 

As Emmanuel mentioned earlier, K10 has a master encryption Key that is used to generate keys per application export.

This key changes if the K10 is reinstalled. However, it will be recovered using K10 DR recovery.

K10-restore chart restores catalog database as well as the resources that were in Kasten-io namespace(which includes profiles, policies, secrets) unless these are skipped during restore.

There is something going wrong in your installation/recovery which we would like to dig in further.

 

Would you be able to open a support request by creating an account on https://my.veeam.com/ ?

If you are using free version, select the Kasten K10 by Veeam Trial under the product to open a case with us including the debug logs as well.


Comment