Solved

Exporting Restore Point fails


Userlevel 1
  • Not a newbie anymore
  • 4 comments

Hello,

I’m trying to setup backup policies, but almost all policy runs fail at the action “Exporting RestorePoint” with the following error: 

cause:
cause:
cause:
message: '["{"message":"Failed to export snapshot
data","function":"kasten.io/k10/kio/exec/phases/phase.(*artifactCopier).convertSnapshots.func1","linenumber":408,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:408","fields":[{"name":"type","value":"CSI"},{"name":"id","value":"k10-csi-snap-rttwnskswscjph98"}],"cause":{"message":"Error
creating portable
snapshot","function":"kasten.io/k10/kio/exec/phases/phase.(*gvcConverter).Convert","linenumber":1178,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:1178","cause":{"message":"ActionSet
Failed","function":"kasten.io/k10/kio/kanister.(*Operation).Execute","linenumber":114,"file":"kasten.io/k10/kio/kanister/operation.go:114","fields":[{"name":"message","value":"{\"message\":\"Failed
while waiting for Pod to be
ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-hmrkf\"}],\"cause\":{\"message\":\"Pod
did not transition into running state.
Timeout:15m0s Namespace:kasten-io, Name:copy-vol-data-hmrkf: Context
done while polling: context deadline
exceeded\"}}"},{"name":"actionSet","value":{"metadata":{"name":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-ksls25","generateName":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-kanister-pvc-2vd75-kasten-io-pvc-","namespace":"kasten-io","uid":"39748042-0502-43ff-90c5-10878d20a150","resourceVersion":"399477","generation":4,"creationTimestamp":"2022-06-02T10:02:25Z","labels":{"kanister.io/JobID":"ec558fa4-e258-11ec-abba-76c5584997a1"},"managedFields":[{"manager":"Go-http-client","operation":"Update","apiVersion":"cr.kanister.io/v1alpha1","time":"2022-06-02T10:02:25Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:kanister.io/JobID":{}}},"f:spec":{".":{},"f:actions":{}},"f:status":{".":{},"f:actions":{},"f:error":{".":{},"f:message":{}},"f:state":{}}}}]},"spec":{"actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-2vd75","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","secrets":{"artifactKey":{"apiVersion":"","group":"","resource":"","kind":"secret","name":"k10-content-store-passphrase-m7w2w","namespace":"kasten-io"}},"profile":{"apiVersion":"v1alpha1","group":"","resource":"","kind":"profile","name":"kanister-portable-copy-f2x6p","namespace":"kasten-io"},"podOverride":{"securityContext":{"runAsNonRoot":false,"runAsUser":0},"tolerations":[{"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300},{"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":300}]},"options":{"hostName":"6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0.acid-postgres-cluster.pgdata-acid-postgres-cluster-1","objectStorePath":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","pvcRepository":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","userName":"k10-admin"},"preferredVersion":"v1.0.0-alpha"}]},"status":{"state":"failed","actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-2vd75","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","phases":[{"name":"copyToObjectStore","state":"failed"}],"artifacts":{"snapshot":{"keyValue":{"backupIdentifier":"{{
.Phases.copyToObjectStore.Output.backupID }}","backupPath":"{{
.Phases.copyToObjectStore.Output.backupRoot }}","funcVersion":"{{
.Phases.copyToObjectStore.Output.version }}","objectStorePath":"{{
.Options.pvcRepository }}","phySize":"{{
.Phases.copyToObjectStore.Output.phySize }}","size":"{{
.Phases.copyToObjectStore.Output.size
}}"}}},"deferPhase":{"name":"","state":""}}],"error":{"message":"{\"message\":\"Failed
while waiting for Pod to be
ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-hmrkf\"}],\"cause\":{\"message\":\"Pod
did not transition into running state.
Timeout:15m0s Namespace:kasten-io, Name:copy-vol-data-hmrkf: Context
done while polling: context deadline
exceeded\"}}"}}}}]}}}","{"message":"Failed to export snapshot
data","function":"kasten.io/k10/kio/exec/phases/phase.(*artifactCopier).convertSnapshots.func1","linenumber":408,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:408","fields":[{"name":"type","value":"CSI"},{"name":"id","value":"k10-csi-snap-6plgr82jtrwhzwwf"}],"cause":{"message":"Error
creating portable
snapshot","function":"kasten.io/k10/kio/exec/phases/phase.(*gvcConverter).Convert","linenumber":1178,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:1178","cause":{"message":"ActionSet
Failed","function":"kasten.io/k10/kio/kanister.(*Operation).Execute","linenumber":114,"file":"kasten.io/k10/kio/kanister/operation.go:114","fields":[{"name":"message","value":"{\"message\":\"Failed
while waiting for Pod to be
ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-xbzgg\"}],\"cause\":{\"message\":\"Pod
did not transition into running state.
Timeout:15m0s Namespace:kasten-io, Name:copy-vol-data-xbzgg: context
deadline
exceeded\"}}"},{"name":"actionSet","value":{"metadata":{"name":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-kmp9lp","generateName":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-kanister-pvc-qfxrw-kasten-io-pvc-","namespace":"kasten-io","uid":"224b6da6-5ba5-4b6d-9562-d5151e5ae335","resourceVersion":"399489","generation":4,"creationTimestamp":"2022-06-02T10:02:25Z","labels":{"kanister.io/JobID":"ec558fa4-e258-11ec-abba-76c5584997a1"},"managedFields":[{"manager":"Go-http-client","operation":"Update","apiVersion":"cr.kanister.io/v1alpha1","time":"2022-06-02T10:02:25Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:kanister.io/JobID":{}}},"f:spec":{".":{},"f:actions":{}},"f:status":{".":{},"f:actions":{},"f:error":{".":{},"f:message":{}},"f:state":{}}}}]},"spec":{"actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-qfxrw","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","secrets":{"artifactKey":{"apiVersion":"","group":"","resource":"","kind":"secret","name":"k10-content-store-passphrase-4zckf","namespace":"kasten-io"}},"profile":{"apiVersion":"v1alpha1","group":"","resource":"","kind":"profile","name":"kanister-portable-copy-f2x6p","namespace":"kasten-io"},"podOverride":{"securityContext":{"runAsNonRoot":false,"runAsUser":0},"tolerations":[{"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300},{"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":300}]},"options":{"hostName":"6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0.acid-postgres-cluster.pgdata-acid-postgres-cluster-2","objectStorePath":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","pvcRepository":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","userName":"k10-admin"},"preferredVersion":"v1.0.0-alpha"}]},"status":{"state":"failed","actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-qfxrw","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","phases":[{"name":"copyToObjectStore","state":"failed"}],"artifacts":{"snapshot":{"keyValue":{"backupIdentifier":"{{
.Phases.copyToObjectStore.Output.backupID }}","backupPath":"{{
.Phases.copyToObjectStore.Output.backupRoot }}","funcVersion":"{{
.Phases.copyToObjectStore.Output.version }}","objectStorePath":"{{
.Options.pvcRepository }}","phySize":"{{
.Phases.copyToObjectStore.Output.phySize }}","size":"{{
.Phases.copyToObjectStore.Output.size
}}"}}},"deferPhase":{"name":"","state":""}}],"error":{"message":"{\"message\":\"Failed
while waiting for Pod to be
ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-xbzgg\"}],\"cause\":{\"message\":\"Pod
did not transition into running state.
Timeout:15m0s Namespace:kasten-io, Name:copy-vol-data-xbzgg: context
deadline exceeded\"}}"}}}}]}}}","{"message":"Failed to export snapshot
data","function":"kasten.io/k10/kio/exec/phases/phase.(*artifactCopier).convertSnapshots.func1","linenumber":408,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:408","fields":[{"name":"type","value":"CSI"},{"name":"id","value":"k10-csi-snap-d8ngsfjr4l9hkk5t"}],"cause":{"message":"Error
creating portable
snapshot","function":"kasten.io/k10/kio/exec/phases/phase.(*gvcConverter).Convert","linenumber":1178,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:1178","cause":{"message":"ActionSet
Failed","function":"kasten.io/k10/kio/kanister.(*Operation).Execute","linenumber":114,"file":"kasten.io/k10/kio/kanister/operation.go:114","fields":[{"name":"message","value":"{\"message\":\"Failed
while waiting for Pod to be
ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-2fsh7\"}],\"cause\":{\"message\":\"Pod
did not transition into running state.
Timeout:15m0s Namespace:kasten-io, Name:copy-vol-data-2fsh7: context
deadline
exceeded\"}}"},{"name":"actionSet","value":{"metadata":{"name":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-k8r9rv","generateName":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-kanister-pvc-qpzlt-kasten-io-pvc-","namespace":"kasten-io","uid":"3ada599a-6c69-4e25-9dae-cfdb4dda1879","resourceVersion":"399506","generation":4,"creationTimestamp":"2022-06-02T10:02:26Z","labels":{"kanister.io/JobID":"ec558fa4-e258-11ec-abba-76c5584997a1"},"managedFields":[{"manager":"Go-http-client","operation":"Update","apiVersion":"cr.kanister.io/v1alpha1","time":"2022-06-02T10:02:26Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:kanister.io/JobID":{}}},"f:spec":{".":{},"f:actions":{}},"f:status":{".":{},"f:actions":{},"f:error":{".":{},"f:message":{}},"f:state":{}}}}]},"spec":{"actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-qpzlt","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","secrets":{"artifactKey":{"apiVersion":"","group":"","resource":"","kind":"secret","name":"k10-content-store-passphrase-r9xjg","namespace":"kasten-io"}},"profile":{"apiVersion":"v1alpha1","group":"","resource":"","kind":"profile","name":"kanister-portable-copy-f2x6p","namespace":"kasten-io"},"podOverride":{"securityContext":{"runAsNonRoot":false,"runAsUser":0},"tolerations":[{"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300},{"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":300}]},"options":{"hostName":"6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0.acid-postgres-cluster.pgdata-acid-postgres-cluster-0","objectStorePath":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","pvcRepository":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","userName":"k10-admin"},"preferredVersion":"v1.0.0-alpha"}]},"status":{"state":"failed","actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-qpzlt","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","phases":[{"name":"copyToObjectStore","state":"failed"}],"artifacts":{"snapshot":{"keyValue":{"backupIdentifier":"{{
.Phases.copyToObjectStore.Output.backupID }}","backupPath":"{{
.Phases.copyToObjectStore.Output.backupRoot }}","funcVersion":"{{
.Phases.copyToObjectStore.Output.version }}","objectStorePath":"{{
.Options.pvcRepository }}","phySize":"{{
.Phases.copyToObjectStore.Output.phySize }}","size":"{{
.Phases.copyToObjectStore.Output.size
}}"}}},"deferPhase":{"name":"","state":""}}],"error":{"message":"{\"message\":\"Failed
while waiting for Pod to be
ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-2fsh7\"}],\"cause\":{\"message\":\"Pod
did not transition into running state.
Timeout:15m0s Namespace:kasten-io, Name:copy-vol-data-2fsh7: context
deadline exceeded\"}}"}}}}]}}}"]'
file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:146
function: kasten.io/k10/kio/exec/phases/phase.(*artifactCopier).Copy
linenumber: 146
message: Error converting snapshots
file: kasten.io/k10/kio/exec/phases/phase/export.go:138
function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
linenumber: 138
message: Failed to copy artifacts
message: Job failed to be executed
fields: []

If I check the PVCs in the kasten-io namespace, I see that there are some PVCs stuck in Pending with the following message:

Name:          kanister-pvc-8trnj
Namespace: kasten-io
StorageClass: rook-ceph-block
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
DataSource:
APIGroup: snapshot.storage.k8s.io
Kind: VolumeSnapshot
Name: snapshot-copy-9qqdbn56
Used By: copy-vol-data-mmbhd
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 2m48s (x26 over 8m45s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
Normal Provisioning 15s (x12 over 8m45s) rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d8bcc5fc4-bc7kd_44ea7e16-68f8-4506-bc45-928559eaf606 External provisioner is provisioning volume for claim "kasten-io/kanister-pvc-8trnj"
Warning ProvisioningFailed 15s (x12 over 8m45s) rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d8bcc5fc4-bc7kd_44ea7e16-68f8-4506-bc45-928559eaf606 failed to provision volume with StorageClass "rook-ceph-block": error getting handle for DataSource Type VolumeSnapshot by Name snapshot-copy-9qqdbn56: error getting snapshot snapshot-copy-9qqdbn56 from api server: the server could not find the requested resource (get volumesnapshots.snapshot.storage.k8s.io snapshot-copy-9qqdbn56)

However the snapshot exists:

 

Name:         snapshot-copy-9qqdbn56
Namespace: kasten-io
Labels: <none>
Annotations: <none>
API Version: snapshot.storage.k8s.io/v1
Kind: VolumeSnapshot
Metadata:
Creation Timestamp: 2022-06-02T10:17:57Z
Finalizers:
snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
Generation: 1
Managed Fields:
API Version: snapshot.storage.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:spec:
.:
f:source:
.:
f:volumeSnapshotContentName:
f:volumeSnapshotClassName:
Manager: executor-server
Operation: Update
Time: 2022-06-02T10:17:57Z
API Version: snapshot.storage.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
f:status:
.:
f:boundVolumeSnapshotContentName:
f:creationTime:
f:readyToUse:
f:restoreSize:
Manager: snapshot-controller
Operation: Update
Time: 2022-06-02T10:17:57Z
Resource Version: 399711
UID: e7b7e672-8569-481c-9bd0-7f8c6d6fc60c
Spec:
Source:
Volume Snapshot Content Name: snapshot-copy-9qqdbn56-content-a42450dc-b3d6-4ee2-99c3-3a0d3ea0cb5a
Volume Snapshot Class Name: k10-clone-csi-rbdplugin-snapclass
Status:
Bound Volume Snapshot Content Name: snapshot-copy-9qqdbn56-content-a42450dc-b3d6-4ee2-99c3-3a0d3ea0cb5a
Creation Time: 2022-06-02T10:17:56Z
Ready To Use: true
Restore Size: 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SnapshotCreated 9m48s snapshot-controller Snapshot kasten-io/snapshot-copy-9qqdbn56 was successfully created by the CSI dr
Normal SnapshotReady 9m48s snapshot-controller Snapshot kasten-io/snapshot-copy-9qqdbn56 is ready to use.

There are also a few jobs, where the export is working:

 

The preflight-check was successful as well:

Kubernetes Version Check:
Valid kubernetes version (v1.20.15) - OK

RBAC Check:
Kubernetes RBAC is enabled - OK

Aggregated Layer Check:
The Kubernetes Aggregated Layer is enabled - OK

W0602 10:29:06.839593 7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor age.k8s.io/v1 CSIDriver
CSI Capabilities Check:
Using CSI GroupVersion snapshot.storage.k8s.io/v1 - OK

W0602 10:29:07.943947 7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor age.k8s.io/v1 CSIDriver
W0602 10:29:07.945922 7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor age.k8s.io/v1 CSIDriver
W0602 10:29:07.947792 7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor age.k8s.io/v1 CSIDriver
W0602 10:29:09.143837 7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor age.k8s.io/v1 CSIDriver
W0602 10:29:09.193755 7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor age.k8s.io/v1 CSIDriver
W0602 10:29:09.243422 7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor age.k8s.io/v1 CSIDriver
Validating Provisioners:
rook-ceph.rbd.csi.ceph.com:
Is a CSI Provisioner - OK
Storage Classes:
rook-ceph-block
Valid Storage Class - OK
Volume Snapshot Classes:
csi-rbdplugin-snapclass
Has k10.kasten.io/is-snapshot-class annotation set to true - OK
Has deletionPolicy 'Delete' - OK
k10-clone-csi-rbdplugin-snapclass

rook-ceph.cephfs.csi.ceph.com:
Is a CSI Provisioner - OK
Storage Classes:
rook-ceph-fs
Valid Storage Class - OK
Volume Snapshot Classes:
csi-cephfsplugin-snapclass
Has k10.kasten.io/is-snapshot-class annotation set to true - OK
Has deletionPolicy 'Delete' - OK

Validate Generic Volume Snapshot:
Pod Created successfully - OK
GVS Backup command executed successfully - OK
Pod deleted successfully - OK

Thanks for your help in advance

icon

Best answer by EBrockman 24 June 2022, 17:58

View original

14 comments

Userlevel 1

I accidently chose the wrong category. Can someone move this to the Kasten K10 Support category?

Thank you.

Userlevel 7
Badge +20

@Rick Vanover or @Madi.Cristil should be able to move the category to Kasten K10 Support for you! 🙂

Userlevel 7
Badge +7

Thank you, @MicoolPaul !😊 

@SBE , that was moved to Kasten K10 Support! :)

Userlevel 3
Badge +1

Hello SBE,

 

Could you please provide the results of the command below when you are running the snapshot with export.

kubectl describe po copy-vol-data-xxxxx -n kasten-io

There should be a pod that is failing during the exporting.

 

Thanks

Emmanuel

Userlevel 1

Hi Emmanuel,

 

thanks for your reply. The pod is in pending state because of the PVC:

 

Name:         copy-vol-data-h82qb
Namespace: kasten-io
Priority: 0
Node: <none>
Labels: createdBy=kanister
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
container:
Image: ghcr.io/kanisterio/kanister-tools:0.79.0
Port: <none>
Host Port: <none>
Command:
bash
-c
tail -f /dev/null
Environment: <none>
Mounts:
/mnt/vol_data/kanister-pvc-pjg5k from vol-kanister-pvc-pjg5k (rw)
/var/run/secrets/kubernetes.io/serviceaccount from k10-k10-token-r5x57 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
vol-kanister-pvc-pjg5k:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kanister-pvc-pjg5k
ReadOnly: false
k10-k10-token-r5x57:
Type: Secret (a volume populated by a Secret)
SecretName: k10-k10-token-r5x57
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m19s default-scheduler 0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 2m19s default-scheduler 0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.

 

Userlevel 3

I have the same issue - any ideas?

Userlevel 3
Badge +1

Hello @SBE

After looking over the described above, this looks to be related to a lack of resources available on all of your nodes to start the new copy-vol pod during exporting. You may need to increase resources of the nodes or add a node (or two) based on the node's resources.

 

This may help with understanding K10 requirements. “https://docs.kasten.io/latest/operating/footprint.html#requirement-guidelines

 

Thanks

Emmanuel

 

 

Hi @EBrockman ,

I am a colleague of SBE.
Looking at his Cluster I can see that lots of CPU/Memory is free.

kubectl top nodes

NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
m1 378m 9% 2902Mi 18%
m2 132m 3% 2240Mi 14%
m3 130m 3% 1916Mi 12%
w1 219m 2% 5092Mi 7%
w2 791m 9% 10812Mi 16%
w3 494m 6% 12732Mi 19%
w4 511m 6% 8555Mi 13%
w5 646m 8% 8676Mi 13%
w6 372m 4% 12725Mi 19%

The Pods do not specify how much they request, but each worker has 8CPU and 64GB memory.
Your link states that 4 Cores and 5GB Memory should be enough, that should fit easily on every single node of the cluster.
 

Userlevel 4
Badge +2

Hello @JTI , @SBE .

Since it needs more investigation I would recommend please if you can open a trial case and upload your debug logs in the case so we can have a better look.

Also what you can try is create a PVC using the snapshot created as a source manually and see how much time it takes to be bound and ready.

Regards
Fernando R.

Userlevel 3
Badge +1

 Hello @JTI @SBE 

 

I am believing the issue is related to your Storage configuration. As you can see above in the PVC describe “waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator" this is a sign that the storage is not able to provision a PV for the PVC. You might want to check events in the kasten-io namespace, to verify on your storage side why the PV are not being created.

 

Thanks

Emmanuel

Userlevel 1

Hello, 

sorry for the late response, I was on vacation.

@EBrockman I checked the events, but there are no further informations. 

@FRubens I opened a case and uploaded the debug logs.

Userlevel 1

Hello @FRubens 

I figured out the problem:

The automatically created PVC is not correct, but I’m not sure whos fault this is. The APIGroup is not correct. It is missing the /v1. 

Here is a describe from the PVC, which is getting created automatically and is not working as intended.

Name:          kanister-pvc-28c52
Namespace: kasten-io
StorageClass: rook-ceph-block
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
DataSource:
APIGroup: snapshot.storage.k8s.io
Kind: VolumeSnapshot
Name: snapshot-copy-7z29k2r6
Used By: copy-vol-data-sk9qs
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 3m57s (x26 over 10m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
Normal Provisioning 100s (x11 over 10m) rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d8bcc5fc4-99hw5_b2d92ce7-3ff4-4707-a5dd-f1ab0072e35e External provisioner is provisioning volume for claim "kasten-io/kanister-pvc-28c52"
Warning ProvisioningFailed 100s (x11 over 10m) rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d8bcc5fc4-99hw5_b2d92ce7-3ff4-4707-a5dd-f1ab0072e35e failed to provision volume with StorageClass "rook-ceph-block": error getting handle for DataSource Type VolumeSnapshot by Name snapshot-copy-7z29k2r6: error getting snapshot snapshot-copy-7z29k2r6 from api server: the server could not find the requested resource (get volumesnapshots.snapshot.storage.k8s.io snapshot-copy-7z29k2r6)

If I create the PVC manually with APIGroup set to snapshot.storage.k8s.io/v1 like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: kanister-pvc
namespace: kasten-io
spec:
accessModes:
- ReadWriteOnce
dataSource:
apiGroup: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
name: snapshot-copy-7z29k2r6
resources:
requests:
storage: "10737418240"
storageClassName: rook-ceph-block
volumeMode: Filesystem

the provisioning succeedes.

Userlevel 3
Badge +1

Hello @SBE

 

I am sorry, when looking for events you would have to grab them quickly after doing the failed task. Events only last for 1 hour by default. Also, this looks to be a storage issue if PVC are not able to be created. I would recommend taking a look into the csi-provisioner logs.

 

Thanks

Emmanuel 

@SBE nice work! I’ve stumbled on the same missing v1 issue. How did you go about fixing it?

Comment