Kasten 7.0.14 not respecting new parameters? | Veeam Community Resource Hub

You can review my journey upgrading form 7.06 to 7.0.14 in this post.

I have a 200Gi PVC that is 155GB written of tiny little index files. It does take quite a bit of time to clone the snapshot, and Kasten would time out after 15min of waiting. I fixed it by increasing kanister.backupTimeout (KanisterBackupTimeout) parameter from 45min to 150min, and the kanister.podReadyWaitTimeout (KanisterPodReadyWaitTimeout) from 15min to 45min.

During my upgrade journey to 7.0.14, I saw that these parameters were depreciated and replaced by timeout.blueprintBackup and timeout.workerPodReady respectively. So naturally I added them to my helm update command and they were added to my k10-config ConfigMap.

My ConfigMap now has both old and new parameters for the Worker Pod Timeout (copy-data-xxxxx pod) set to 45min but I am now getting timeout errors that my “Pod did not transition into running state. Timeout:15m0s”

I have tried without the old values in the ConfigMap as well and no change. I have also tried the helm upgrade command with all 4 --set-options as well as manually deleting all of the pods and Kasten is still not respecting the 45min Worker Pod Timeout.

With all the issues I had just upgrading 7.0.6 to 7.0.14 I am afraid to upgrade to 7.5 before this can be fixed. Any help appreciated.

Error:

K8s: 1.30.6
Longhorn: 1.7.2
Kasten 7.0.14
k10-config ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: k10
    meta.helm.sh/release-namespace: kasten-io
  labels:
    app: k10
    app.kubernetes.io/instance: k10
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: k10
    helm.sh/chart: k10-7.0.14
    heritage: Helm
    release: k10
  name: k10-config
  namespace: kasten-io
  data:
  AWSAssumeRoleDuration: 60m
  DataStoreFileLogLevel: ""
  DataStoreLogLevel: error
  K10BackupBufferFileHeadroomFactor: "1.1"
  K10DefaultPriorityClassName: ""
  K10EphemeralPVCOverhead: "0.1"
  K10ForceRootInBlueprintActions: "true"
  K10GCActionsEnabled: "false"
  K10GCDaemonPeriod: "21600"
  K10GCKeepMaxActions: "1000"
  K10LimiterCsiSnapshotRestoresPerAction: "3"
  K10LimiterCsiSnapshotsPerCluster: "10"
  K10LimiterDirectSnapshotsPerCluster: "10"
  K10LimiterExecutorThreads: "8"
  K10LimiterGenericVolumeBackupsPerCluster: "10"
  K10LimiterImageCopiesPerCluster: "10"
  K10LimiterSnapshotExportsPerAction: "3"
  K10LimiterSnapshotExportsPerCluster: "10"
  K10LimiterVolumeRestoresPerAction: "3"
  K10LimiterVolumeRestoresPerCluster: "10"
  K10LimiterWorkloadRestoresPerAction: "3"
  K10LimiterWorkloadSnapshotsPerAction: "5"
  K10MutatingWebhookTLSCertDir: /etc/ssl/certs/webhook
  K10PersistenceStorageClass: longhorn
  K10TimeoutBlueprintBackup: "150"
  K10TimeoutBlueprintDelete: "45"
  K10TimeoutBlueprintHooks: "20"
  K10TimeoutBlueprintRestore: "600"
  K10TimeoutCheckRepoPodReady: "20"
  K10TimeoutEFSRestorePodReady: "45"
  K10TimeoutJobWait: ""
  K10TimeoutStatsPodReady: "20"
  K10TimeoutWorkerPodReady: "45"    #<<<<<<<<<<<<<<<<<<<<<<<<<<<<NEW TIMEOUT SET
  KanisterBackupTimeout: "150"
  KanisterManagedDataServicesBlueprintsEnabled: "true"
  KanisterPodReadyWaitTimeout: "45"    #<<<<<<<<<<<<<<<<<<<<<<<<<OLD TIMEOUT SET
  KanisterToolsImage: gcr.io/kasten-images/kanister-tools:7.0.14
  WorkerPodMetricSidecarCPULimit: ""
  WorkerPodMetricSidecarCPURequest: ""
  WorkerPodMetricSidecarEnabled: "true"
  WorkerPodMetricSidecarMemoryLimit: ""
  WorkerPodMetricSidecarMemoryRequest: ""
  WorkerPodMetricSidecarMetricLifetime: 2m
  WorkerPodPushgatewayMetricsInterval: 30s
  apiDomain: kio.kasten.io
  efsBackupVaultName: k10vault
  excludedApps: kube-system,kube-ingress,kube-node-lease,kube-public,kube-rook-ceph
  k10DataStoreDisableCompression: "false"
  k10DataStoreGeneralContentCacheSizeMB: "0"
  k10DataStoreGeneralMetadataCacheSizeMB: "500"
  k10DataStoreParallelDownload: "8"
  k10DataStoreParallelUpload: "8"
  k10DataStoreRestoreContentCacheSizeMB: "500"
  k10DataStoreRestoreMetadataCacheSizeMB: "500"
  kanisterFunctionVersion: v1.0.0-alpha
  kubeVirtVMsUnFreezeTimeout: 5m
  loglevel: info
  modelstoredirname: //mnt/k10state/kasten-io/
  multiClusterVersion: "2.5"
  quickDisasterRecoveryEnabled: "false"
  version: 7.0.14
  vmWareTaskTimeoutMin: "60"
  workerPodResourcesCRDEnabled: "false"

Page 1 / 1

Please submit a support ticket so we can test and verify the behavior.

Hi Michael, I have submitted case 07537768.

Hi @NPatel ,
I have exactly the same problem with the same configuration: Kasten 7.0.14 and Longhorn CSI.
The copy-vol-data-XXXX pods only live for 15 minutes even though I set the timeout.workerPodReady to a higher value.
Please let us know the solution provided by the support.

Thank you

Hi @smartini @NPatel
i was able to recreate the issue, will get back to you soon with more details.

Thanks,
Ahmed Hagag

Hi @Hagag @NPatel ,
do you have news about this issue?

Thanks

Hi @smartini

The fix should be available in the next release 7.5.2 , please keep monitoring our release note page and upgrade k10.

https://docs.kasten.io/latest/releasenotes.html

Thanks

thank you @Hagag.

Hi @smartini @Hagag

I can confirm the fix in 7.5.2 for timeout.workerPodReady (K10TimeoutWorkerPodReady) is working correctly. I was able to directly upgrade to 7.5.2 from 7.0.14 without issue. My large PV has grown 20GB during this ticket and the now 175GB PV not only exported successfully, but did so 15min faster than previously.

Comment

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded