Skip to main content
Solved

Kasten 7.0.14 not respecting new parameters?

  • December 8, 2024
  • 8 comments
  • 244 views

You can review my journey upgrading form 7.06 to 7.0.14 in this post.

I have a 200Gi PVC that is 155GB written of tiny little index files. It does take quite a bit of time to clone the snapshot, and Kasten would time out after 15min of waiting. I fixed it by increasing kanister.backupTimeout  (KanisterBackupTimeout) parameter from 45min to 150min, and the kanister.podReadyWaitTimeout  (KanisterPodReadyWaitTimeout) from 15min to 45min.

During my upgrade journey to 7.0.14, I saw that these parameters were depreciated and replaced by timeout.blueprintBackup and timeout.workerPodReady respectively. So naturally I added them to my helm update command and they were added to my k10-config ConfigMap.  

My ConfigMap now has both old and new parameters for the Worker Pod Timeout (copy-data-xxxxx pod) set to 45min but I am now getting timeout errors that my “Pod did not transition into running state. Timeout:15m0s”

I have tried without the old values in the ConfigMap as well and no change. I have also tried the helm upgrade command with all 4 --set-options as well as manually deleting all of the pods and Kasten is still not respecting the 45min Worker Pod Timeout.


With all the issues I had just upgrading 7.0.6 to 7.0.14 I am afraid to upgrade to 7.5 before this can be fixed. Any help appreciated.

Error:
 

 

K8s: 1.30.6
Longhorn: 1.7.2
Kasten 7.0.14
k10-config ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: k10
meta.helm.sh/release-namespace: kasten-io
labels:
app: k10
app.kubernetes.io/instance: k10
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: k10
helm.sh/chart: k10-7.0.14
heritage: Helm
release: k10
name: k10-config
namespace: kasten-io
data:
AWSAssumeRoleDuration: 60m
DataStoreFileLogLevel: ""
DataStoreLogLevel: error
K10BackupBufferFileHeadroomFactor: "1.1"
K10DefaultPriorityClassName: ""
K10EphemeralPVCOverhead: "0.1"
K10ForceRootInBlueprintActions: "true"
K10GCActionsEnabled: "false"
K10GCDaemonPeriod: "21600"
K10GCKeepMaxActions: "1000"
K10LimiterCsiSnapshotRestoresPerAction: "3"
K10LimiterCsiSnapshotsPerCluster: "10"
K10LimiterDirectSnapshotsPerCluster: "10"
K10LimiterExecutorThreads: "8"
K10LimiterGenericVolumeBackupsPerCluster: "10"
K10LimiterImageCopiesPerCluster: "10"
K10LimiterSnapshotExportsPerAction: "3"
K10LimiterSnapshotExportsPerCluster: "10"
K10LimiterVolumeRestoresPerAction: "3"
K10LimiterVolumeRestoresPerCluster: "10"
K10LimiterWorkloadRestoresPerAction: "3"
K10LimiterWorkloadSnapshotsPerAction: "5"
K10MutatingWebhookTLSCertDir: /etc/ssl/certs/webhook
K10PersistenceStorageClass: longhorn
K10TimeoutBlueprintBackup: "150"
K10TimeoutBlueprintDelete: "45"
K10TimeoutBlueprintHooks: "20"
K10TimeoutBlueprintRestore: "600"
K10TimeoutCheckRepoPodReady: "20"
K10TimeoutEFSRestorePodReady: "45"
K10TimeoutJobWait: ""
K10TimeoutStatsPodReady: "20"
K10TimeoutWorkerPodReady: "45" #<<<<<<<<<<<<<<<<<<<<<<<<<<<<NEW TIMEOUT SET
KanisterBackupTimeout: "150"
KanisterManagedDataServicesBlueprintsEnabled: "true"
KanisterPodReadyWaitTimeout: "45" #<<<<<<<<<<<<<<<<<<<<<<<<<OLD TIMEOUT SET
KanisterToolsImage: gcr.io/kasten-images/kanister-tools:7.0.14
WorkerPodMetricSidecarCPULimit: ""
WorkerPodMetricSidecarCPURequest: ""
WorkerPodMetricSidecarEnabled: "true"
WorkerPodMetricSidecarMemoryLimit: ""
WorkerPodMetricSidecarMemoryRequest: ""
WorkerPodMetricSidecarMetricLifetime: 2m
WorkerPodPushgatewayMetricsInterval: 30s
apiDomain: kio.kasten.io
efsBackupVaultName: k10vault
excludedApps: kube-system,kube-ingress,kube-node-lease,kube-public,kube-rook-ceph
k10DataStoreDisableCompression: "false"
k10DataStoreGeneralContentCacheSizeMB: "0"
k10DataStoreGeneralMetadataCacheSizeMB: "500"
k10DataStoreParallelDownload: "8"
k10DataStoreParallelUpload: "8"
k10DataStoreRestoreContentCacheSizeMB: "500"
k10DataStoreRestoreMetadataCacheSizeMB: "500"
kanisterFunctionVersion: v1.0.0-alpha
kubeVirtVMsUnFreezeTimeout: 5m
loglevel: info
modelstoredirname: //mnt/k10state/kasten-io/
multiClusterVersion: "2.5"
quickDisasterRecoveryEnabled: "false"
version: 7.0.14
vmWareTaskTimeoutMin: "60"
workerPodResourcesCRDEnabled: "false"







 

Best answer by Hagag

Hi ​@smartini 

The fix should be available in the next release 7.5.2 , please keep monitoring our release note page and upgrade k10.

https://docs.kasten.io/latest/releasenotes.html

Thanks

8 comments

Forum|alt.badge.img
  • Comes here often
  • December 9, 2024

Please submit a support ticket so we can test and verify the behavior.

 

 


  • Author
  • Comes here often
  • December 11, 2024

Please submit a support ticket so we can test and verify the behavior.

 

 

Hi Michael, I have submitted case 07537768. 


  • New Here
  • December 12, 2024

Hi ​@NPatel ,
I have exactly the same problem with the same configuration: Kasten 7.0.14 and Longhorn CSI.
The copy-vol-data-XXXX pods only live for 15 minutes even though I set the timeout.workerPodReady to a higher value.
Please let us know the solution provided by the support.

Thank you


Hagag
Forum|alt.badge.img+2
  • Experienced User
  • December 15, 2024

Hi ​@smartini ​@NPatel 
i was able to recreate the issue, will get back to you soon with more details.

Thanks,
Ahmed Hagag


  • New Here
  • January 9, 2025

Hi ​@Hagag ​@NPatel ,
do you have news about this issue?

Thanks

 


Hagag
Forum|alt.badge.img+2
  • Experienced User
  • Answer
  • January 9, 2025

Hi ​@smartini 

The fix should be available in the next release 7.5.2 , please keep monitoring our release note page and upgrade k10.

https://docs.kasten.io/latest/releasenotes.html

Thanks


  • New Here
  • January 9, 2025

thank you ​@Hagag.


  • Author
  • Comes here often
  • January 19, 2025

Hi ​@smartini ​@Hagag 

I can confirm the fix in 7.5.2 for timeout.workerPodReady (K10TimeoutWorkerPodReady) is working correctly. I was able to directly upgrade to 7.5.2 from 7.0.14 without issue. My large PV has grown 20GB during this ticket and the now 175GB PV not only exported successfully, but did so 15min faster than previously.