Hello,
I’m testing Veeam Kasten for the first time, but unfortunately it won't work as expected. I’ve created a backup policy with a snapshot and an export to a s3 rados-gw (ceph) endpoint.
I’ve got the following error message while exporting the snapshot:
- cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
message: "client rate limiter Wait returned an error: context deadline exceeded"
file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod.go
function: github.com/kanisterio/kanister/pkg/kube.WaitForPodReady
linenumber: 384
message: Pod did not transition into running state.
Timeout:15m0s Namespace:kasten-io,
Name:copy-vol-data-rpr56
file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod_controller.go
function: github.com/kanisterio/kanister/pkg/kube.(*podController).WaitForPodReady
linenumber: 174
message: Pod failed to become ready in time
fields:
- name: pod
value: copy-vol-data-rpr56
- name: namespace
value: kasten-io
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:304
function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func2
linenumber: 304
message: failed while waiting for Pod to be ready
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:161
function: kasten.io/k10/kio/kanister/function.CopyVolumeData
linenumber: 161
message: Failed to execute copy volume data pod function
file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:249
function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).genericVolumeCopy
linenumber: 249
message: failed running copyVolumeData
file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:170
function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).CopySnapshotRestoredInPVC
linenumber: 170
message: failed running genericVolumeCopy
file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:77
function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverter).Convert
linenumber: 77
message: Error creating portable snapshot
fields:
- name: type
value: CSI
- name: id
value: k10-csi-snap-pcw6kxxzgsql8hj6
file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:544
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).convertSnapshots.func1
linenumber: 544
message: Failed to export snapshot data
file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:274
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).Copy
linenumber: 274
message: Error converting snapshots
file: kasten.io/k10/kio/exec/phases/phase/export.go:172
function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
linenumber: 172
message: Failed to copy artifacts
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
message: "client rate limiter Wait returned an error: context deadline exceeded"
file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod.go
function: github.com/kanisterio/kanister/pkg/kube.WaitForPodReady
linenumber: 384
message: Pod did not transition into running state.
Timeout:15m0s Namespace:kasten-io,
Name:copy-vol-data-t8l8p
file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod_controller.go
function: github.com/kanisterio/kanister/pkg/kube.(*podController).WaitForPodReady
linenumber: 174
message: Pod failed to become ready in time
fields:
- name: pod
value: copy-vol-data-t8l8p
- name: namespace
value: kasten-io
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:304
function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func2
linenumber: 304
message: failed while waiting for Pod to be ready
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:161
function: kasten.io/k10/kio/kanister/function.CopyVolumeData
linenumber: 161
message: Failed to execute copy volume data pod function
file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:249
function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).genericVolumeCopy
linenumber: 249
message: failed running copyVolumeData
file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:170
function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).CopySnapshotRestoredInPVC
linenumber: 170
message: failed running genericVolumeCopy
file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:77
function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverter).Convert
linenumber: 77
message: Error creating portable snapshot
fields:
- name: type
value: CSI
- name: id
value: k10-csi-snap-pcw6kxxzgsql8hj6
file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:544
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).convertSnapshots.func1
linenumber: 544
message: Failed to export snapshot data
file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:274
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).Copy
linenumber: 274
message: Error converting snapshots
file: kasten.io/k10/kio/exec/phases/phase/export.go:172
function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
linenumber: 172
message: Failed to copy artifacts
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
cause:
message: "Context done while polling: context deadline exceeded"
file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod.go
function: github.com/kanisterio/kanister/pkg/kube.getErrorFromLogs
linenumber: 334
file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod.go
function: github.com/kanisterio/kanister/pkg/kube.getErrorFromLogs
linenumber: 334
message: Pod did not transition into running state.
Timeout:15m0s Namespace:kasten-io,
Name:copy-vol-data-v84db
file: github.com/kanisterio/kanister@v0.0.0-20240812194716-8812756d1751/pkg/kube/pod_controller.go
function: github.com/kanisterio/kanister/pkg/kube.(*podController).WaitForPodReady
linenumber: 174
message: Pod failed to become ready in time
fields:
- name: pod
value: copy-vol-data-v84db
- name: namespace
value: kasten-io
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:304
function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func2
linenumber: 304
message: failed while waiting for Pod to be ready
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:161
function: kasten.io/k10/kio/kanister/function.CopyVolumeData
linenumber: 161
message: Failed to execute copy volume data pod function
file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:249
function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).genericVolumeCopy
linenumber: 249
message: failed running copyVolumeData
file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:170
function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverterInternalAPIImpl).CopySnapshotRestoredInPVC
linenumber: 170
message: failed running genericVolumeCopy
file: kasten.io/k10/kio/exec/internal/snapshotconverters/ac_gvc_converter.go:77
function: kasten.io/k10/kio/exec/internal/snapshotconverters.(*GVCConverter).Convert
linenumber: 77
message: Error creating portable snapshot
fields:
- name: type
value: CSI
- name: id
value: k10-csi-snap-pcw6kxxzgsql8hj6
file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:544
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).convertSnapshots.func1
linenumber: 544
message: Failed to export snapshot data
file: kasten.io/k10/kio/exec/phases/phase/artifactcopier.go:274
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).Copy
linenumber: 274
message: Error converting snapshots
file: kasten.io/k10/kio/exec/phases/phase/export.go:172
function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
linenumber: 172
message: Failed to copy artifacts
message: Job failed to be executed
The Pre-Flight-Check was successful, the snapshot itself is left on the storage system while the export is failed. So this part seems to work, just the export fails.
Kubernetes Distribution: v1.28.9+rke2r1
PVC Size in target Namespace: ~1,1TB
CSI: Ceph-CSI-Driver (RBD and CephFS)
Snapshotter: rke2-snapshot-controller:1.7.202 (It’s the external-snapshotter)
s3 Bucket (WORM configured):
s3api put-object-lock-configuration --bucket kasten --object-lock-configuration='{ "ObjectLockEnabled": "Enabled", "Rule": { "DefaultRetention": { "Mode": "COMPLIANCE", "Days": 14 }}}'
The Pod “copy-vol-data-xzb5k” itself says: “0/7 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..”
Furthermore, in the description of the copy-pod:
Volumes:
vol-8f8503ca-66ad-11ef-9a4d-7ead5d941560:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kanister-pvc-2945w
ReadOnly: false
The PVC for the copy-pod:
k describe pvc/kanister-pvc-2945w -n kasten-io
Name: kanister-pvc-2945w
Namespace: kasten-io
StorageClass: csi-cephfs-sc
Status: Pending
Volume:
Labels: k10.kasten.io/readyForGC=true
Annotations: k10.kasten.io/readyForGCAt: 2024-08-30T18:55:00Z
volume.beta.kubernetes.io/storage-provisioner: cephfs.csi.ceph.com
volume.kubernetes.io/storage-provisioner: cephfs.csi.ceph.com
Finalizers: ckubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
DataSource:
APIGroup: snapshot.storage.k8s.io
Kind: VolumeSnapshot
Name: snapshot-copy-qnxkgcjf
Used By: copy-vol-data-xzb5k
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 6m36s cephfs.csi.ceph.com_ceph-csi-cephfs-provisioner-5b9d856485-6m2mj_039263d9-89af-4c0c-acc4-db35b8bc9056 failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Aborted desc = clone from snapshot is pending
Normal Provisioning 2m20s (x10 over 6m36s) cephfs.csi.ceph.com_ceph-csi-cephfs-provisioner-5b9d856485-6m2mj_039263d9-89af-4c0c-acc4-db35b8bc9056 External provisioner is provisioning volume for claim "kasten-io/kanister-pvc-2945w"
Warning ProvisioningFailed 2m20s (x9 over 6m35s) cephfs.csi.ceph.com_ceph-csi-cephfs-provisioner-5b9d856485-6m2mj_039263d9-89af-4c0c-acc4-db35b8bc9056 failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Aborted desc = clone from snapshot is already in progress
Normal ExternalProvisioning 21s (x26 over 6m36s) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'cephfs.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
This looks like the root cause for me. “clone from snapshot is pending”
Log from Ceph-CSI-Provisioner:
79b9cc7-4b9a-4156-b9a6-a0e2ae43f0f7"): map1csi.imagename:csi-vol-b79b9cc7-4b9a-4156-b9a6-a0e2ae43f0f7 csi.volname:pvc-4344ab96-5a49-4e70-8d99-8a6c9f1a209c csi.volume.owner:kasten-io]
E0830 09:03:34.739380 1 utils.go:203] ID: 24877 Req-ID: pvc-4344ab96-5a49-4e70-8d99-8a6c9f1a209c GRPC error: rpc error: code = Aborted desc = clone from snapshot is already in progress
So in the end, it seems, that the snapshot took too long for finishing and the timeout of the copy-vol-pod is reached. Is there a value, which I can modify this behavior?