Skip to main content

Kubernetes: Rancher RKE2

Storage: Rook-Ceph

Cluster is behind a corporate proxy.
Using Minio Server for S3 Object Storage.

Our team is working towards Testing out and building configurations to utilize a Kasten Instance in our Kubernetes Cluster. So far we have been able to successfully execute backup and restore, including cluster-to-cluster import restore operations.

However we are having some difficulties with the Disaster Recovery Feature. So far we have accomplished:

  • Enabling DR
  • Executing the K10-disaster-recovery-policy

 

The Cluster is setup with the k10-dr-secret, and the location profile is set.
When we call the k10-restore chart install, we are passing in our clusterID and profile.name, however we are getting an error from the spun up logs for the kasten-io-restore-k10restore pod.

 

When attempting to run the k10restore chart, and I am receiving a tls error:

{"Container":"container","File":"pkg/format/format.go","Function":"github.com/kanisterio/kanister/pkg/format.LogWithCtx","Line":90,"LogKind":"datapath","Out":"\u001b131mERROR\u001b10m can't connect to storage: error retrieving storage config from bucket \"kasten\": Get \"https://myminio.backup.com/kasten/k10/%3D/migration/%3D/k10/repo/.storageconfig\": tls: failed to verify certificate: x509: certificate signed by unknown authority","Pod":"data-mover-svc-dxlms","hostname":"kasten-io-restore-k10restore-9zjc4","level":"info","msg":"Pod Update","time":"2024-06-27T18:50:50.772043997Z"}
{"File":"kasten.io/k10/kio/dr/utils.go","Function":"kasten.io/k10/kio/dr.RestoreStatusForError","Line":149,"cluster_name":"40787e0a-8885-4515-a7fa-d40114712ebc","error":{"message":"Failed to initialize Kopia API server","function":"kasten.io/k10/kio/kopiaapiserver.SetupAPIServerForDRRestore","linenumber":373,"file":"kasten.io/k10/kio/kopiaapiserver/api_server.go:373","cause":{"message":"Failed to connect to the backup repository","function":"kasten.io/k10/kio/kopia.ConnectToKopiaRepository","linenumber":700,"file":"kasten.io/k10/kio/kopia/repository.go:700","cause":{"message":"Failed to exec command in pod: command terminated with exit code 1.\nstdout: \nstderr: \u001b131mERROR\u001b10m can't connect to storage: error retrieving storage config from bucket \"kasten\": Get \"https://myminio.backup.com/kasten/k10/%3D/migration/%3D/k10/repo/.storageconfig\": tls: failed to verify certificate: x509: certificate signed by unknown authority"}}},"hostname":"kasten-io-restore-k10restore-9zjc4","level":"error","msg":"Failed to setup Kopia API server for DR metadata import","time":"2024-06-27T18:50:50.859Z"}
{"File":"kasten.io/k10/kio/exec/phases/phase/dr_restore.go","Function":"kasten.io/k10/kio/exec/phases/phase.RunK10DRRestore","Line":100,"cluster_name":"40787e0a-8885-4515-a7fa-d40114712ebc","error":{"message":"Failed to setup Kopia API server for DR metadata import","function":"kasten.io/k10/kio/dr.Import","linenumber":47,"file":"kasten.io/k10/kio/dr/import.go:47","cause":{"message":"Failed to initialize Kopia API server","function":"kasten.io/k10/kio/kopiaapiserver.SetupAPIServerForDRRestore","linenumber":373,"file":"kasten.io/k10/kio/kopiaapiserver/api_server.go:373","cause":{"message":"Failed to connect to the backup repository","function":"kasten.io/k10/kio/kopia.ConnectToKopiaRepository","linenumber":700,"file":"kasten.io/k10/kio/kopia/repository.go:700","cause":{"message":"Failed to exec command in pod: command terminated with exit code 1.\nstdout: \nstderr: \u001b131mERROR\u001b10m can't connect to storage: error retrieving storage config from bucket \"kasten\": Get \"https://myminio.backup.com/kasten/k10/%3D/migration/%3D/k10/repo/.storageconfig\": tls: failed to verify certificate: x509: certificate signed by unknown authority"}}}},"hostname":"kasten-io-restore-k10restore-9zjc4","level":"error","msg":"Failed to perform Disaster Recovery using quick mode workflow","time":"2024-06-27T18:50:50.860Z"}
Error: {"message":"Failed to perform K10 Disaster Recovery","function":"kasten.io/k10/kio/tools/restorectl.runRestoreCommand.func1","linenumber":71,"file":"kasten.io/k10/kio/tools/restorectl/restore.go:71","fields":"{"name":"error","value":"Failed to setup Kopia API server for DR metadata import"}]}

 

We suspect the issue is needing to Trust our Corporate Certificates, something we have already run into and resolved with our Kasten helm install.

 

However the k10-restore helm chart has no values for specifying volumes, volumemounts, or init containers.

 

How do we get the k10-restore  to trust the needed certificates?

Or should k10-restore be utilizing the same trusted certificates from our Kasten instance?

Is there some other error in the background we havn’t noticed?

@aburt Thank you for posting your question here.

It seems that the K10 restore chart doesn’t have an option to specify root CA configMap name in it. 

Let me quickly check internally and get back to you shortly.


@jaiganeshjk Has there been updates regarding adding the configMap name option to the k10restore helm chart?

Or is there any alternative guidanceon this?


@aburt I am sorry I missed responding to this topic earlier. 

I confirmed that the helm chart for k10 restore doesn’t have a value for the configmap unfortunately. 

Good thing is that I found a workaround by using pod overrides and I am planning to test it locally. 
I will keep you posted on this.
Sorry I missed to get back to you earlier.


@jaiganeshjk thank you sir I appreciate the assistance


@aburt Unfortunately, I could not make this work with the workaround I mentioned earlier. This override worked for restore-data-dr pod but it would fail earlier in k10restore job pod itself.  
I am filing a ticket internally and checking with engineering as we speak for this.

This would still be doable currently(by making changes in the chart) if you are looking to test a DR restore immediately. I can help you with this. Please let me know. 


@aburt Would you be able to try this override in kasten namespace and see if the K10 DR restore works without hitting the trusted certificates issue

cat << EOF | kubectl create -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: pod-spec-override
namespace: kasten-io
data:
override: |
kind: Pod
spec:
containers:
- name: container
volumeMounts:
- mountPath: /etc/ssl/certs/custom-ca-bundle.pem
name: custom-ca-bundle-store
subPath: custom-ca-bundle.pem
volumes:
- configMap:
defaultMode: 420
name: custom-ca-bundle-store
name: custom-ca-bundle-store
EOF

This configmap will add the configmap as a volume in the ephemeral pods that K10 creates.

Please try it out and let me know the outcome. 


@jaiganeshjk Good news and bad news. A completely new error:

 

{"Command":"kopia --log-level=error --config-file=/tmp/kopia-repository.config --log-dir=/tmp/kopia-log --password=\u003c****\u003e repository connect --no-check-for-updates --cache-directory=/tmp/kopia-cache --content-cache-size-limit-mb=0 --metadata-cache-size-limit-mb=500 --override-hostname=data-mover-server-pod --override-username=k10-admin s3 --bucket=kasten --endpoint=sf-s3-backup.gccsj.nn.c2fse.northgrum.com --access-key=\u003c****\u003e --secret-access-key=\u003c****\u003e --prefix=k10/=/migration/=/k10/repo/ --region=us-east","File":"kasten.io/k10/kio/kopia/kopia.go","Function":"kasten.io/k10/kio/kopia.stringSliceCommand","Line":134,"cluster_name":"40787e0a-8885-4515-a7fa-d40114712ebc","hostname":"kasten-io-restore-k10restore-h6k5m","level":"info","msg":"kopia command","time":"2024-07-18T13:49:52.970Z"}
{"Container":"container","File":"pkg/format/format.go","Function":"github.com/kanisterio/kanister/pkg/format.LogWithCtx","Line":90,"LogKind":"datapath","Out":"\u001b131mERROR\u001b10m error connecting to repository: repository not initialized in the provided storage","Pod":"data-mover-svc-wg6w4","hostname":"kasten-io-restore-k10restore-h6k5m","level":"info","msg":"Pod Update","time":"2024-07-18T13:49:53.179967155Z"}
{"File":"kasten.io/k10/kio/tools/restorectl/restore.go","Function":"kasten.io/k10/kio/tools/restorectl.runRestoreCommand.func1","Line":68,"cluster_name":"40787e0a-8885-4515-a7fa-d40114712ebc","hostname":"kasten-io-restore-k10restore-h6k5m","level":"info","msg":"Waiting for restore to complete","progress":"Fetching data from storage","time":"2024-07-18T13:49:53.468Z"}
{"File":"kasten.io/k10/kio/dr/utils.go","Function":"kasten.io/k10/kio/dr.RestoreStatusForError","Line":149,"cluster_name":"40787e0a-8885-4515-a7fa-d40114712ebc","error":{"message":"Failed to initialize Kopia API server","function":"kasten.io/k10/kio/kopiaapiserver.SetupAPIServerForDRRestore","linenumber":373,"file":"kasten.io/k10/kio/kopiaapiserver/api_server.go:373","cause":{"message":"Failed to connect to the backup repository","function":"kasten.io/k10/kio/kopia.ConnectToKopiaRepository","linenumber":700,"file":"kasten.io/k10/kio/kopia/repository.go:700","cause":{"message":"repository not found","cause":{"message":"Failed to exec command in pod: command terminated with exit code 1.\nstdout: \nstderr: \u001b131mERROR\u001b10m error connecting to repository: repository not initialized in the provided storage"}}}},"hostname":"kasten-io-restore-k10restore-h6k5m","level":"error","msg":"Failed to setup Kopia API server for DR metadata import","time":"2024-07-18T13:49:53.587Z"}
{"File":"kasten.io/k10/kio/exec/phases/phase/dr_restore.go","Function":"kasten.io/k10/kio/exec/phases/phase.RunK10DRRestore","Line":100,"cluster_name":"40787e0a-8885-4515-a7fa-d40114712ebc","error":{"message":"Failed to setup Kopia API server for DR metadata import","function":"kasten.io/k10/kio/dr.Import","linenumber":47,"file":"kasten.io/k10/kio/dr/import.go:47","cause":{"message":"Failed to initialize Kopia API server","function":"kasten.io/k10/kio/kopiaapiserver.SetupAPIServerForDRRestore","linenumber":373,"file":"kasten.io/k10/kio/kopiaapiserver/api_server.go:373","cause":{"message":"Failed to connect to the backup repository","function":"kasten.io/k10/kio/kopia.ConnectToKopiaRepository","linenumber":700,"file":"kasten.io/k10/kio/kopia/repository.go:700","cause":{"message":"repository not found","cause":{"message":"Failed to exec command in pod: command terminated with exit code 1.\nstdout: \nstderr: \u001b131mERROR\u001b10m error connecting to repository: repository not initialized in the provided storage"}}}}},"hostname":"kasten-io-restore-k10restore-h6k5m","level":"error","msg":"Failed to perform Disaster Recovery using quick mode workflow","time":"2024-07-18T13:49:53.588Z"}
Error: {"message":"Failed to perform K10 Disaster Recovery","function":"kasten.io/k10/kio/tools/restorectl.runRestoreCommand.func1","linenumber":71,"file":"kasten.io/k10/kio/tools/restorectl/restore.go:71","fields":"{"name":"error","value":"Failed to setup Kopia API server for DR metadata import"}]}
Usage:
restorectl restore eflags]

Flags:
-q, --enableQuickDisasterRecovery string This flag should be enabled if quick K10 Disaster Recovery
needs to be performed. The flag should be used if
kastenDisasterRecovery.quickMode was enabled for K10 helm chart
-h, --help help for restore
-i, --snapshotID restorectl validate Specify one of the snapshot IDs obtained from the restorectl validate command, latest is selected by default.

Global Flags:
-c, --clusterid string Specify the cluster ID captured during backup (required)
-n, --namespace string Specify the namespace where K10 is currently deployed (required)
-t, --point-in-time string Specify an optional point in time (RFC3339) at which to evaluate restore data
-p, --profile string Specify the profile that was used during backup (required)
-s, --skipResource string Specify if restore of policies,profiles,secrets needs to be skipped.

{"File":"kasten.io/k10/kio/tools/restorectl/root.go","Function":"kasten.io/k10/kio/tools/restorectl.Execute","Line":26,"cluster_name":"40787e0a-8885-4515-a7fa-d40114712ebc","error":{"message":"Failed to perform K10 Disaster Recovery","function":"kasten.io/k10/kio/tools/restorectl.runRestoreCommand.func1","linenumber":71,"file":"kasten.io/k10/kio/tools/restorectl/restore.go:71","fields":"{"name":"error","value":"Failed to setup Kopia API server for DR metadata import"}]},"hostname":"kasten-io-restore-k10restore-h6k5m","level":"error","msg":"Failed","time":"2024-07-18T13:49:56.671Z"}

 


@aburt I see that --prefix=k10/=/migration/=/k10/repo/ is messed up in the command. I might need debug logs to understand what’s going on. 

Would you be able to open a support case(https://my.veeam.com/open-case/step-1) by selecting Veeam Kasten for Kubernetes free product so that we can look at deeper ?

 

We might need below logs to understand what’s going on.

  • Logs from the k10restore pod
  • Helm values/Helm install command used for k10restore.

 


opend case #07345785

with the included logs and details you suggested

I can even comment the log files here as well if you would like.


the prefix being set to

--prefix=k10/=/migration/=/k10/repo/

is a result of the sourceClusterID not being set, and in case of my helm-chart being deployed by flux, I had the variable misspelled as sourceClusterId which caused the issue.


The next problem I am facing and trying to find out is that when my backup policies are run, or even the k10 dr policy is run that it is publishing the backups under the path of

k10/<my-cluster-domain-name>/migration/<sourceClusterID>/migration/<sourceClusterID>

whereas the new prefix of --prefix=k10/<sourceClusterID>/migration/<sourceClusterID>/repo does not match.

So I am working to figure out why the backups are being published with a different path.

I tried to manually move the files to the right path in MINIO, however that woudl only result in kickback form kasten/kopia that the files were not initialized in that directory.


Comment