Hello,
I’m testing Veeam Kasten for the first time, but unfortunately it won't work as expected. I’ve created a backup policy with a snapshot and an export to a s3 rados-gw (ceph) endpoint.
I’ve got the following error message while exporting the snapshot:
1- cause:2 cause:3 cause:4 cause:5 cause:6 cause:7 cause:8 cause:9 cause:10 cause:11 cause:12 message: "client rate limiter Wait returned an error: context deadline exceeded"13 file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod.go14 function: github.com/kanisterio/kanister/pkg/kube.WaitForPodReady15 linenumber: 38416 message: Pod did not transition into running state.17 Timeout:15m0s Namespace:kasten-io,18 Name:copy-vol-data-rpr5619 file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod_controller.go20 function: github.com/kanisterio/kanister/pkg/kube.(*podController).WaitForPodReady21 linenumber: 17422 message: Pod failed to become ready in time23 fields:24 - name: pod25 value: copy-vol-data-rpr5626 - name: namespace27 value: kasten-io28 file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:30429 function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func230 linenumber: 30431 message: failed while waiting for Pod to be ready32 file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:16133 function: kasten.io/k10/kio/kanister/function.CopyVolumeData34 linenumber: 16135 message: Failed to execute copy volume data pod function36 file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:24937 function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).genericVolumeCopy38 linenumber: 24939 message: failed running copyVolumeData40 file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:17041 function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).CopySnapshotRestoredInPVC42 linenumber: 17043 message: failed running genericVolumeCopy44 file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:7745 function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverter).Convert46 linenumber: 7747 message: Error creating portable snapshot48 fields:49 - name: type50 value: CSI51 - name: id52 value: k10-csi-snap-pcw6kxxzgsql8hj653 file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:54454 function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).convertSnapshots.func155 linenumber: 54456 message: Failed to export snapshot data57 file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:27458 function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).Copy59 linenumber: 27460 message: Error converting snapshots61 file: kasten.io/k10/kio/exec/phases/phase/export.go:17262 function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run63 linenumber: 17264 message: Failed to copy artifacts65 message: Job failed to be executed66- cause:67 cause:68 cause:69 cause:70 cause:71 cause:72 cause:73 cause:74 cause:75 cause:76 cause:77 message: "client rate limiter Wait returned an error: context deadline exceeded"78 file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod.go79 function: github.com/kanisterio/kanister/pkg/kube.WaitForPodReady80 linenumber: 38481 message: Pod did not transition into running state.82 Timeout:15m0s Namespace:kasten-io,83 Name:copy-vol-data-t8l8p84 file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod_controller.go85 function: github.com/kanisterio/kanister/pkg/kube.(*podController).WaitForPodReady86 linenumber: 17487 message: Pod failed to become ready in time88 fields:89 - name: pod90 value: copy-vol-data-t8l8p91 - name: namespace92 value: kasten-io93 file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:30494 function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func295 linenumber: 30496 message: failed while waiting for Pod to be ready97 file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:16198 function: kasten.io/k10/kio/kanister/function.CopyVolumeData99 linenumber: 161100 message: Failed to execute copy volume data pod function101 file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:249102 function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).genericVolumeCopy103 linenumber: 249104 message: failed running copyVolumeData105 file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:170106 function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).CopySnapshotRestoredInPVC107 linenumber: 170108 message: failed running genericVolumeCopy109 file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:77110 function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverter).Convert111 linenumber: 77112 message: Error creating portable snapshot113 fields:114 - name: type115 value: CSI116 - name: id117 value: k10-csi-snap-pcw6kxxzgsql8hj6118 file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:544119 function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).convertSnapshots.func1120 linenumber: 544121 message: Failed to export snapshot data122 file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:274123 function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).Copy124 linenumber: 274125 message: Error converting snapshots126 file: kasten.io/k10/kio/exec/phases/phase/export.go:172127 function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run128 linenumber: 172129 message: Failed to copy artifacts130 message: Job failed to be executed131- cause:132 cause:133 cause:134 cause:135 cause:136 cause:137 cause:138 cause:139 cause:140 cause:141 cause:142 cause:143 message: "Context done while polling: context deadline exceeded"144 file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod.go145 function: github.com/kanisterio/kanister/pkg/kube.getErrorFromLogs146 linenumber: 334147 file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod.go148 function: github.com/kanisterio/kanister/pkg/kube.getErrorFromLogs149 linenumber: 334150 message: Pod did not transition into running state.151 Timeout:15m0s Namespace:kasten-io,152 Name:copy-vol-data-v84db153 file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod_controller.go154 function: github.com/kanisterio/kanister/pkg/kube.(*podController).WaitForPodReady155 linenumber: 174156 message: Pod failed to become ready in time157 fields:158 - name: pod159 value: copy-vol-data-v84db160 - name: namespace161 value: kasten-io162 file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:304163 function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func2164 linenumber: 304165 message: failed while waiting for Pod to be ready166 file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:161167 function: kasten.io/k10/kio/kanister/function.CopyVolumeData168 linenumber: 161169 message: Failed to execute copy volume data pod function170 file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:249171 function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).genericVolumeCopy172 linenumber: 249173 message: failed running copyVolumeData174 file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:170175 function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).CopySnapshotRestoredInPVC176 linenumber: 170177 message: failed running genericVolumeCopy178 file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:77179 function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverter).Convert180 linenumber: 77181 message: Error creating portable snapshot182 fields:183 - name: type184 value: CSI185 - name: id186 value: k10-csi-snap-pcw6kxxzgsql8hj6187 file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:544188 function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).convertSnapshots.func1189 linenumber: 544190 message: Failed to export snapshot data191 file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:274192 function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).Copy193 linenumber: 274194 message: Error converting snapshots195 file: kasten.io/k10/kio/exec/phases/phase/export.go:172196 function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run197 linenumber: 172198 message: Failed to copy artifacts199 message: Job failed to be executed200The Pre-Flight-Check was successful, the snapshot itself is left on the storage system while the export is failed. So this part seems to work, just the export fails.
Kubernetes Distribution: v1.28.9+rke2r1
PVC Size in target Namespace: ~1,1TB
CSI: Ceph-CSI-Driver (RBD and CephFS)
Snapshotter: rke2-snapshot-controller:1.7.202 (It’s the external-snapshotter)
s3 Bucket (WORM configured):
1s3api put-object-lock-configuration --bucket kasten --object-lock-configuration='{ "ObjectLockEnabled": "Enabled", "Rule": { "DefaultRetention": { "Mode": "COMPLIANCE", "Days": 14 }}}'
The Pod “copy-vol-data-xzb5k” itself says: “0/7 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..”
Furthermore, in the description of the copy-pod:
1Volumes:2 vol-8f8503ca-66ad-11ef-9a4d-7ead5d941560:3 Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)4 ClaimName: kanister-pvc-2945w5 ReadOnly: false6The PVC for the copy-pod:
1k describe pvc/kanister-pvc-2945w -n kasten-io2Name: kanister-pvc-2945w3Namespace: kasten-io4StorageClass: csi-cephfs-sc5Status: Pending6Volume:7Labels: k10.kasten.io/readyForGC=true8Annotations: k10.kasten.io/readyForGCAt: 2024-08-30T18:55:00Z9 volume.beta.kubernetes.io/storage-provisioner: cephfs.csi.ceph.com10 volume.kubernetes.io/storage-provisioner: cephfs.csi.ceph.com11Finalizers: [kubernetes.io/pvc-protection]12Capacity:13Access Modes:14VolumeMode: Filesystem15DataSource:16 APIGroup: snapshot.storage.k8s.io17 Kind: VolumeSnapshot18 Name: snapshot-copy-qnxkgcjf19Used By: copy-vol-data-xzb5k20Events:21 Type Reason Age From Message22 ---- ------ ---- ---- -------23 Warning ProvisioningFailed 6m36s cephfs.csi.ceph.com_ceph-csi-cephfs-provisioner-5b9d856485-6m2mj_039263d9-89af-4c0c-acc4-db35b8bc9056 failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Aborted desc = clone from snapshot is pending24 Normal Provisioning 2m20s (x10 over 6m36s) cephfs.csi.ceph.com_ceph-csi-cephfs-provisioner-5b9d856485-6m2mj_039263d9-89af-4c0c-acc4-db35b8bc9056 External provisioner is provisioning volume for claim "kasten-io/kanister-pvc-2945w"25 Warning ProvisioningFailed 2m20s (x9 over 6m35s) cephfs.csi.ceph.com_ceph-csi-cephfs-provisioner-5b9d856485-6m2mj_039263d9-89af-4c0c-acc4-db35b8bc9056 failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Aborted desc = clone from snapshot is already in progress26 Normal ExternalProvisioning 21s (x26 over 6m36s) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'cephfs.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.This looks like the root cause for me. “clone from snapshot is pending”
Log from Ceph-CSI-Provisioner:
179b9cc7-4b9a-4156-b9a6-a0e2ae43f0f7"): map[csi.imagename:csi-vol-b79b9cc7-4b9a-4156-b9a6-a0e2ae43f0f7 csi.volname:pvc-4344ab96-5a49-4e70-8d99-8a6c9f1a209c csi.volume.owner:kasten-io]2E0830 09:03:34.739380 1 utils.go:203] ID: 24877 Req-ID: pvc-4344ab96-5a49-4e70-8d99-8a6c9f1a209c GRPC error: rpc error: code = Aborted desc = clone from snapshot is already in progress3
So in the end, it seems, that the snapshot took too long for finishing and the timeout of the copy-vol-pod is reached. Is there a value, which I can modify this behavior?
