Hi @Garland7362 Thank you for posting the question.
From the error message, it seems that the kopia binary is not available in the pod that is spinned-up for exports.
It could be either create-repo or copy-vol-data pod. Do you see which image it is trying to use ?
I am just checking how k10 runs exec command in the newly created pods. We could be trying to run k10 kopia commands in first container but for some reason it could be running in linkerD proxy container rather than the kanister-tools container
Hi @jaiganeshjk thanks for your response.
I’ve just checked and it spins up create-repo and backup-data-stats first and later data-mover-svc but the export shows a retry with x2 in the console right after the create-repo.
Also, i had a thought. When these pods are deployed linkerd will cause the admission controller to mutate the pod spec to include a linkerd-proxy container. I wonder because k10 just assumes there is only one container that it just attempts to exec kopia in the first container which happens now to be the linkerd-proxy one?
Tim
Ha ha - we both thought of the same cause at the same time!
I see that you have a support case with us as well.
Let me confirm if that is the cause of the issue and see what could be done if this is the case.
Thanks @jaiganeshjk - much appreciated
@Garland7362
I can confirm that the linkerd proxy container is where the commands are being run.
We seem to be running exec commands in first container as we don't expect to have multiple containers with the pod that we dynamically spin-up.
We will have to enhance this behaviour for your use case. I have filed an enhancement request to support this configuration.
I went through the Linkerd issues to find out the reason for adding the proxy container as the first container.
They mention that this change was purposefully added to avoid startup issues in this github issue .
It seems that you can use annotation in the workload config.linkerd.io/proxy-await: "disabled"
to disable this feature.
Thinking about working around this, You could try using custom kanister annotations helm value to add this annotation to the pods spinned up by K10.
This would make sure that the kanister-tools
container be the first container and should resolve your issue for time-being.
https://docs.kasten.io/latest/kanister/override.html#configuring-custom-labels-and-annotations
Please let me know if it works for you.
@jaiganeshjk
Thanks for working through this and for raising the enhancement request.
I am trying to implement the workaround but having issues.
Following the documentation link, I added to my helm values file:
kanisterPodCustomAnnotations: "config.linkerd.io/proxy-await=disabled"
When k10 is deployed virtually all the pods remain stuck in CreateContainerConfigError. Looking at one of the pods on events it shows:
9s Warning Failed pod/executor-svc-6cf948dd75-crvql Error: couldn't find key kanisterPodCustomAnnotations in ConfigMap kasten/k10-config
I can see the kanisterPodCustomAnnotations in the k10-config configmap:
apiVersion: v1
data:
AWSAssumeRoleDuration: 60m
K10BackupBufferFileHeadroomFactor: "1.1"
K10ExecutorMaxConcurrentRestoreCsiSnapshots: "3"
K10ExecutorMaxConcurrentRestoreGenericVolumeSnapshots: "3"
K10ExecutorMaxConcurrentRestoreWorkloads: "3"
K10ExecutorWorkerCount: "8"
K10GCDaemonPeriod: "21600"
K10GCImportRunActionsEnabled: "false"
K10GCKeepMaxActions: "1000"
K10GCRetireActionsEnabled: "false"
K10LimiterCsiSnapshots: "10"
K10LimiterGenericVolumeCopies: "10"
K10LimiterGenericVolumeRestores: "10"
K10LimiterGenericVolumeSnapshots: "10"
K10LimiterProviderSnapshots: "10"
K10MutatingWebhookTLSCertDir: /etc/ssl/certs/webhook
K10RootlessContainers: "false"
KanisterBackupTimeout: "45"
KanisterCheckRepoTimeout: "20"
KanisterDeleteTimeout: "45"
KanisterEFSPostRestoreTimeout: "45"
KanisterHookTimeout: "20"
KanisterPodCustomAnnotations: config.linkerd.io/proxy-await=disabled
KanisterPodReadyWaitTimeout: "15"
KanisterRestoreTimeout: "600"
KanisterStatsTimeout: "20"
apiDomain: kio.kasten.io
concurrentSnapConversions: "3"
concurrentWorkloadSnapshots: "5"
efsBackupVaultName: k10vault
k10DataStoreGeneralContentCacheSizeMB: "0"
k10DataStoreGeneralMetadataCacheSizeMB: "500"
k10DataStoreParallelUpload: "8"
k10DataStoreRestoreContentCacheSizeMB: "500"
k10DataStoreRestoreMetadataCacheSizeMB: "500"
kanisterFunctionVersion: v1.0.0-alpha
kubeVirtVMsUnFreezeTimeout: 5m
loglevel: info
modelstoredirname: //mnt/k10state/kasten-io/
multiClusterVersion: "2"
version: 5.5.2
vmWareTaskTimeoutMin: "60"
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: kasten
meta.helm.sh/release-namespace: kasten
creationTimestamp: "2023-03-02T13:57:08Z"
labels:
app: k10
app.kubernetes.io/instance: kasten
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: k10
helm.sh/chart: k10-5.5.2
heritage: Helm
release: kasten
name: k10-config
namespace: kasten
resourceVersion: "727624"
uid: c22f8f9c-4148-457f-90cc-ed8832b45ca2
Checking one of the pods I don’t see the annotation present:
apiVersion: v1
kind: Pod
metadata:
annotations:
checksum/config: e9e02307e75a18baf778c4179ed1e3a581d163273e4614c240cb09aa5897ef9d
checksum/frontend-nginx-config: 44e6086c684885c88e43f79224e688aec51a0672e5f87fc961ed2af9006e60fb
checksum/secret: 90de018eb29ceff98d4bdbf538a1ec6f1696a830dbd249640db547e571ca8569
linkerd.io/created-by: linkerd/proxy-injector stable-2.12.4
linkerd.io/inject: enabled
linkerd.io/proxy-version: stable-2.12.4
linkerd.io/trust-root-sha256: 514ac68fae331666e1366e476d3e49e3b72226b9d0b3483f2a15007d510b09bc
rollme: HLRIq
viz.linkerd.io/tap-enabled: "true"
creationTimestamp: "2023-03-02T13:57:10Z"
generateName: executor-svc-6cf948dd75-
labels:
app: k10
app.kubernetes.io/instance: kasten
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: k10
component: executor
helm.sh/chart: k10-5.5.2
heritage: Helm
linkerd.io/control-plane-ns: linkerd
linkerd.io/proxy-deployment: executor-svc
linkerd.io/workload-ns: kasten
pod-template-hash: 6cf948dd75
release: kasten
run: executor-svc
name: executor-svc-6cf948dd75-2x45t
namespace: kasten
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: executor-svc-6cf948dd75
uid: 25f315ef-968d-47bb-9652-efe12fa72a33
resourceVersion: "728396"
uid: 59b7ec3f-c946-45c6-8f11-1505f17c7940
spec:
...
I’ve tried using “: ” instead of “=” in the annotation but no change.
Is there something I am doing wrong here?
Tim
I think I know the answer. It seems to be a bug from our side(I will file a bug and get it fixed).
The keyReference(from the configMap) that is added to the workloads is kanisterPodCustomAnnotations
Where as the Key that was added in the configmap is KanisterPodCustomAnnotations
Notice the upper case K in the configMap and lowercase in the deployments.
You might have to workaround it for the time-being by editing the configmap and changing the key to kanisterPodCustomAnnotations
(with lowercase k)
Thanks, I didn’t spot that. Yes, editing the key name in the configmap solved the startup issue. I’ll test now to see if backup exports now work.
@jaiganeshjk,
I have now successfully completed one test cycle of my test app: add items to database, backup (including logical mysql backup), delete items, restore.
This appears to have fixed the problem.
I will continue to test over the next few days to confirm.
Thank you - this is excellent.
Are you able to advise timeframe estimates for the k10-config bug and the enhancement for the exec to not run the linkerd container?
@Garland7362 Glad that the workaround works for you now.
I don’t have any timelines. It has to go through the PM review and it will be worked upon.
All I can say that the bug will be fixed before the enhancement.
IMO, your workaround to add annotations to the kanister pod will ensure your exports work until we have the enhancement in this workflow, unless something changes in the linkerd wrt to the annotation.
Thanks for the workaround and queued bug fix and enhancement.
I have closed the support case.
Hi guys,
It looks like in version 5.5.7 the initial bug has been fixed as deployments now seem to be checking KanisterPodCustomAnnotations
from in the k10-config configMap (replacing K with k makes the deployments fail so ...). That’s good news.
Problem is that when I to add annotation sidecar.istio.io/inject=false
(as in my case I’m using Istio service mesh) I cannot see that annotation on the kanister-job-* pods created and the istio proxy sidecar container is still injected.
I also tried using the pod-spec-override to add the annotation but without much success.
Ref to the documentation used: https://docs.kasten.io/latest/kanister/override.html
What should I do to have the annotation actually added to the kanister-job pods?
Hi @stephw,
I tried to upgrade to 5.5.7 to test this in my cluster but it causes an issue that will take some time to resolve so I cannot not test at present.
In 5.5.2, once I had changed the “K to “k” in k10-config, the annotation I had specified was present in the kanister job pods when they were deployed.
One temporary workaround if you are pressing up against a deadline and you have kyverno on the cluster is to create a kyverno mutate policy that mutates the pod spec at admission time to add the annotation. I have successfully used this to work around problems on other helm charts until they are fixed.
@Garland7362 @jaiganeshjk @stephw Any update on this? We’re using Kasten 6.5.1 with Linkerd as our service mesh solution and still experience the problem - adding an annotation for disabling the linkerd proxy injection to either pod-spec-override ConfigMap:
apiVersion: v1
data:
override: |
kind: Pod
metadata:
annotations:
linkerd.io/inject: disabled
kind: ConfigMap
metadata:
name: pod-spec-override
namespace: kasten-io
or as a key in our Helm Chart installation:
KanisterPodCustomAnnotations: "linkerd.io/inject=disabled"
does not seem to work - the key is properly set in k10-config but kanister jobs are still injected with the Linkerd proxy. We are not planning to use Kyverno. We could disable automatic proxy injection and add the annotation manually to each of our workflows but we treat this as a last resort solution.
Hello @peterturnip
Reposting here also since other users could find this post.
The correct option in this case would be to use custom annotations, since pod-spec-override is not intended to add annotations.
We are aware about kanisterPodCustomAnnotations and kanisterPodCustomLabels not being applied to kanister-job pods, and we will be working to fix this as soon as possible. The annotations and labels added in those fields are being applied to pods that runs on kasten-io namespace during the backup/export when using blueprints (i.e. datamover,copy-vol-data), but kanister-job pods runs on application's namespace.
I will update this thread as soon as we have the fix.
Also would like to recommend to keep an eye in our release pages where we will inform about new features and bug fixes.
https://docs.kasten.io/latest/releasenotes.html
Regards,
Rubens
Hello @Garland7362 @peterturnip @stephw,
Would you to share that on Veeam Kasten 7.0.9 we have fixed the issue regarding Custom annotations/labels not being applied to Kasten Ephemeral pods.
https://docs.kasten.io/latest/releasenotes.html#relnotes-7-0-9
Added Helm flags global.podLabels
and global.podAnnotations
that can be used to set labels and annotations on all Veeam Kasten pods globally. It including all ephemeral pods created by Veeam Kasten.
Regards
Rubens