Skip to main content
Solved

Exporting Restore Point fails


  • Not a newbie anymore
  • 4 comments

Hello,

I’m trying to setup backup policies, but almost all policy runs fail at the action “Exporting RestorePoint” with the following error: 

cause:
  cause:
    cause:
      message: '["{"message":"Failed to export snapshot
        data","function":"kasten.io/k10/kio/exec/phases/phase.(*artifactCopier).convertSnapshots.func1","linenumber":408,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:408","fields":[{"name":"type","value":"CSI"},{"name":"id","value":"k10-csi-snap-rttwnskswscjph98"}],"cause":{"message":"Error
        creating portable
        snapshot","function":"kasten.io/k10/kio/exec/phases/phase.(*gvcConverter).Convert","linenumber":1178,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:1178","cause":{"message":"ActionSet
        Failed","function":"kasten.io/k10/kio/kanister.(*Operation).Execute","linenumber":114,"file":"kasten.io/k10/kio/kanister/operation.go:114","fields":[{"name":"message","value":"{\"message\":\"Failed
        while waiting for Pod to be
        ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-hmrkf\"}],\"cause\":{\"message\":\"Pod
        did not transition into running state.
        Timeout:15m0s  Namespace:kasten-io, Name:copy-vol-data-hmrkf: Context
        done while polling: context deadline
        exceeded\"}}"},{"name":"actionSet","value":{"metadata":{"name":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-ksls25","generateName":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-kanister-pvc-2vd75-kasten-io-pvc-","namespace":"kasten-io","uid":"39748042-0502-43ff-90c5-10878d20a150","resourceVersion":"399477","generation":4,"creationTimestamp":"2022-06-02T10:02:25Z","labels":{"kanister.io/JobID":"ec558fa4-e258-11ec-abba-76c5584997a1"},"managedFields":[{"manager":"Go-http-client","operation":"Update","apiVersion":"cr.kanister.io/v1alpha1","time":"2022-06-02T10:02:25Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:kanister.io/JobID":{}}},"f:spec":{".":{},"f:actions":{}},"f:status":{".":{},"f:actions":{},"f:error":{".":{},"f:message":{}},"f:state":{}}}}]},"spec":{"actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-2vd75","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","secrets":{"artifactKey":{"apiVersion":"","group":"","resource":"","kind":"secret","name":"k10-content-store-passphrase-m7w2w","namespace":"kasten-io"}},"profile":{"apiVersion":"v1alpha1","group":"","resource":"","kind":"profile","name":"kanister-portable-copy-f2x6p","namespace":"kasten-io"},"podOverride":{"securityContext":{"runAsNonRoot":false,"runAsUser":0},"tolerations":[{"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300},{"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":300}]},"options":{"hostName":"6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0.acid-postgres-cluster.pgdata-acid-postgres-cluster-1","objectStorePath":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","pvcRepository":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","userName":"k10-admin"},"preferredVersion":"v1.0.0-alpha"}]},"status":{"state":"failed","actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-2vd75","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","phases":[{"name":"copyToObjectStore","state":"failed"}],"artifacts":{"snapshot":{"keyValue":{"backupIdentifier":"{{
        .Phases.copyToObjectStore.Output.backupID }}","backupPath":"{{
        .Phases.copyToObjectStore.Output.backupRoot }}","funcVersion":"{{
        .Phases.copyToObjectStore.Output.version }}","objectStorePath":"{{
        .Options.pvcRepository }}","phySize":"{{
        .Phases.copyToObjectStore.Output.phySize }}","size":"{{
        .Phases.copyToObjectStore.Output.size
        }}"}}},"deferPhase":{"name":"","state":""}}],"error":{"message":"{\"message\":\"Failed
        while waiting for Pod to be
        ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-hmrkf\"}],\"cause\":{\"message\":\"Pod
        did not transition into running state.
        Timeout:15m0s  Namespace:kasten-io, Name:copy-vol-data-hmrkf: Context
        done while polling: context deadline
        exceeded\"}}"}}}}]}}}","{"message":"Failed to export snapshot
        data","function":"kasten.io/k10/kio/exec/phases/phase.(*artifactCopier).convertSnapshots.func1","linenumber":408,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:408","fields":[{"name":"type","value":"CSI"},{"name":"id","value":"k10-csi-snap-6plgr82jtrwhzwwf"}],"cause":{"message":"Error
        creating portable
        snapshot","function":"kasten.io/k10/kio/exec/phases/phase.(*gvcConverter).Convert","linenumber":1178,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:1178","cause":{"message":"ActionSet
        Failed","function":"kasten.io/k10/kio/kanister.(*Operation).Execute","linenumber":114,"file":"kasten.io/k10/kio/kanister/operation.go:114","fields":[{"name":"message","value":"{\"message\":\"Failed
        while waiting for Pod to be
        ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-xbzgg\"}],\"cause\":{\"message\":\"Pod
        did not transition into running state.
        Timeout:15m0s  Namespace:kasten-io, Name:copy-vol-data-xbzgg: context
        deadline
        exceeded\"}}"},{"name":"actionSet","value":{"metadata":{"name":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-kmp9lp","generateName":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-kanister-pvc-qfxrw-kasten-io-pvc-","namespace":"kasten-io","uid":"224b6da6-5ba5-4b6d-9562-d5151e5ae335","resourceVersion":"399489","generation":4,"creationTimestamp":"2022-06-02T10:02:25Z","labels":{"kanister.io/JobID":"ec558fa4-e258-11ec-abba-76c5584997a1"},"managedFields":[{"manager":"Go-http-client","operation":"Update","apiVersion":"cr.kanister.io/v1alpha1","time":"2022-06-02T10:02:25Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:kanister.io/JobID":{}}},"f:spec":{".":{},"f:actions":{}},"f:status":{".":{},"f:actions":{},"f:error":{".":{},"f:message":{}},"f:state":{}}}}]},"spec":{"actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-qfxrw","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","secrets":{"artifactKey":{"apiVersion":"","group":"","resource":"","kind":"secret","name":"k10-content-store-passphrase-4zckf","namespace":"kasten-io"}},"profile":{"apiVersion":"v1alpha1","group":"","resource":"","kind":"profile","name":"kanister-portable-copy-f2x6p","namespace":"kasten-io"},"podOverride":{"securityContext":{"runAsNonRoot":false,"runAsUser":0},"tolerations":[{"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300},{"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":300}]},"options":{"hostName":"6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0.acid-postgres-cluster.pgdata-acid-postgres-cluster-2","objectStorePath":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","pvcRepository":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","userName":"k10-admin"},"preferredVersion":"v1.0.0-alpha"}]},"status":{"state":"failed","actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-qfxrw","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","phases":[{"name":"copyToObjectStore","state":"failed"}],"artifacts":{"snapshot":{"keyValue":{"backupIdentifier":"{{
        .Phases.copyToObjectStore.Output.backupID }}","backupPath":"{{
        .Phases.copyToObjectStore.Output.backupRoot }}","funcVersion":"{{
        .Phases.copyToObjectStore.Output.version }}","objectStorePath":"{{
        .Options.pvcRepository }}","phySize":"{{
        .Phases.copyToObjectStore.Output.phySize }}","size":"{{
        .Phases.copyToObjectStore.Output.size
        }}"}}},"deferPhase":{"name":"","state":""}}],"error":{"message":"{\"message\":\"Failed
        while waiting for Pod to be
        ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-xbzgg\"}],\"cause\":{\"message\":\"Pod
        did not transition into running state.
        Timeout:15m0s  Namespace:kasten-io, Name:copy-vol-data-xbzgg: context
        deadline exceeded\"}}"}}}}]}}}","{"message":"Failed to export snapshot
        data","function":"kasten.io/k10/kio/exec/phases/phase.(*artifactCopier).convertSnapshots.func1","linenumber":408,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:408","fields":[{"name":"type","value":"CSI"},{"name":"id","value":"k10-csi-snap-d8ngsfjr4l9hkk5t"}],"cause":{"message":"Error
        creating portable
        snapshot","function":"kasten.io/k10/kio/exec/phases/phase.(*gvcConverter).Convert","linenumber":1178,"file":"kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:1178","cause":{"message":"ActionSet
        Failed","function":"kasten.io/k10/kio/kanister.(*Operation).Execute","linenumber":114,"file":"kasten.io/k10/kio/kanister/operation.go:114","fields":[{"name":"message","value":"{\"message\":\"Failed
        while waiting for Pod to be
        ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-2fsh7\"}],\"cause\":{\"message\":\"Pod
        did not transition into running state.
        Timeout:15m0s  Namespace:kasten-io, Name:copy-vol-data-2fsh7: context
        deadline
        exceeded\"}}"},{"name":"actionSet","value":{"metadata":{"name":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-k8r9rv","generateName":"k10-copy-k10-persistentvolumeclaim-generic-volume-2.0.20-kanister-pvc-qpzlt-kasten-io-pvc-","namespace":"kasten-io","uid":"3ada599a-6c69-4e25-9dae-cfdb4dda1879","resourceVersion":"399506","generation":4,"creationTimestamp":"2022-06-02T10:02:26Z","labels":{"kanister.io/JobID":"ec558fa4-e258-11ec-abba-76c5584997a1"},"managedFields":[{"manager":"Go-http-client","operation":"Update","apiVersion":"cr.kanister.io/v1alpha1","time":"2022-06-02T10:02:26Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:kanister.io/JobID":{}}},"f:spec":{".":{},"f:actions":{}},"f:status":{".":{},"f:actions":{},"f:error":{".":{},"f:message":{}},"f:state":{}}}}]},"spec":{"actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-qpzlt","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","secrets":{"artifactKey":{"apiVersion":"","group":"","resource":"","kind":"secret","name":"k10-content-store-passphrase-r9xjg","namespace":"kasten-io"}},"profile":{"apiVersion":"v1alpha1","group":"","resource":"","kind":"profile","name":"kanister-portable-copy-f2x6p","namespace":"kasten-io"},"podOverride":{"securityContext":{"runAsNonRoot":false,"runAsUser":0},"tolerations":[{"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300},{"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":300}]},"options":{"hostName":"6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0.acid-postgres-cluster.pgdata-acid-postgres-cluster-0","objectStorePath":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","pvcRepository":"repo/6cd5aec8-5bdb-4d73-a5a3-04f01e6427b0/","userName":"k10-admin"},"preferredVersion":"v1.0.0-alpha"}]},"status":{"state":"failed","actions":[{"name":"copy","object":{"apiVersion":"","group":"","resource":"","kind":"pvc","name":"kanister-pvc-qpzlt","namespace":"kasten-io"},"blueprint":"k10-persistentvolumeclaim-generic-volume-2.0.20","phases":[{"name":"copyToObjectStore","state":"failed"}],"artifacts":{"snapshot":{"keyValue":{"backupIdentifier":"{{
        .Phases.copyToObjectStore.Output.backupID }}","backupPath":"{{
        .Phases.copyToObjectStore.Output.backupRoot }}","funcVersion":"{{
        .Phases.copyToObjectStore.Output.version }}","objectStorePath":"{{
        .Options.pvcRepository }}","phySize":"{{
        .Phases.copyToObjectStore.Output.phySize }}","size":"{{
        .Phases.copyToObjectStore.Output.size
        }}"}}},"deferPhase":{"name":"","state":""}}],"error":{"message":"{\"message\":\"Failed
        while waiting for Pod to be
        ready\",\"function\":\"kasten.io/k10/kio/kanister/function.copyVolumeDataPodFunc.func1\",\"linenumber\":153,\"file\":\"kasten.io/k10/kio/kanister/function/copy_volume_data.go:153\",\"fields\":[{\"name\":\"pod\",\"value\":\"copy-vol-data-2fsh7\"}],\"cause\":{\"message\":\"Pod
        did not transition into running state.
        Timeout:15m0s  Namespace:kasten-io, Name:copy-vol-data-2fsh7: context
        deadline exceeded\"}}"}}}}]}}}"]'
    file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:146
    function: kasten.io/k10/kio/exec/phases/phase.(*artifactCopier).Copy
    linenumber: 146
    message: Error converting snapshots
  file: kasten.io/k10/kio/exec/phases/phase/export.go:138
  function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
  linenumber: 138
  message: Failed to copy artifacts
message: Job failed to be executed
fields: []

If I check the PVCs in the kasten-io namespace, I see that there are some PVCs stuck in Pending with the following message:

Name:          kanister-pvc-8trnj
Namespace:     kasten-io
StorageClass:  rook-ceph-block
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
DataSource:
  APIGroup:  snapshot.storage.k8s.io
  Kind:      VolumeSnapshot
  Name:      snapshot-copy-9qqdbn56
Used By:     copy-vol-data-mmbhd
Events:
  Type     Reason                Age                     From                                                                                                       Message
  ----     ------                ----                    ----                                                                                                       -------
  Normal   ExternalProvisioning  2m48s (x26 over 8m45s)  persistentvolume-controller                                                                                waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          15s (x12 over 8m45s)    rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d8bcc5fc4-bc7kd_44ea7e16-68f8-4506-bc45-928559eaf606  External provisioner is provisioning volume for claim "kasten-io/kanister-pvc-8trnj"
  Warning  ProvisioningFailed    15s (x12 over 8m45s)    rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d8bcc5fc4-bc7kd_44ea7e16-68f8-4506-bc45-928559eaf606  failed to provision volume with StorageClass "rook-ceph-block": error getting handle for DataSource Type VolumeSnapshot by Name snapshot-copy-9qqdbn56: error getting snapshot snapshot-copy-9qqdbn56 from api server: the server could not find the requested resource (get volumesnapshots.snapshot.storage.k8s.io snapshot-copy-9qqdbn56)

However the snapshot exists:

 

Name:         snapshot-copy-9qqdbn56
Namespace:    kasten-io
Labels:       <none>
Annotations:  <none>
API Version:  snapshot.storage.k8s.io/v1
Kind:         VolumeSnapshot
Metadata:
  Creation Timestamp:  2022-06-02T10:17:57Z
  Finalizers:
    snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
  Generation:  1
  Managed Fields:
    API Version:  snapshot.storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:source:
          .:
          f:volumeSnapshotContentName:
        f:volumeSnapshotClassName:
    Manager:      executor-server
    Operation:    Update
    Time:         2022-06-02T10:17:57Z
    API Version:  snapshot.storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
      f:status:
        .:
        f:boundVolumeSnapshotContentName:
        f:creationTime:
        f:readyToUse:
        f:restoreSize:
    Manager:         snapshot-controller
    Operation:       Update
    Time:            2022-06-02T10:17:57Z
  Resource Version:  399711
  UID:               e7b7e672-8569-481c-9bd0-7f8c6d6fc60c
Spec:
  Source:
    Volume Snapshot Content Name:  snapshot-copy-9qqdbn56-content-a42450dc-b3d6-4ee2-99c3-3a0d3ea0cb5a
  Volume Snapshot Class Name:      k10-clone-csi-rbdplugin-snapclass
Status:
  Bound Volume Snapshot Content Name:  snapshot-copy-9qqdbn56-content-a42450dc-b3d6-4ee2-99c3-3a0d3ea0cb5a
  Creation Time:                       2022-06-02T10:17:56Z
  Ready To Use:                        true
  Restore Size:                        0
Events:
  Type    Reason           Age    From                 Message
  ----    ------           ----   ----                 -------
  Normal  SnapshotCreated  9m48s  snapshot-controller  Snapshot kasten-io/snapshot-copy-9qqdbn56 was successfully created by the CSI dr
  Normal  SnapshotReady    9m48s  snapshot-controller  Snapshot kasten-io/snapshot-copy-9qqdbn56 is ready to use.

There are also a few jobs, where the export is working:

 

The preflight-check was successful as well:

Kubernetes Version Check:
  Valid kubernetes version (v1.20.15)  -  OK

RBAC Check:
  Kubernetes RBAC is enabled  -  OK

Aggregated Layer Check:
  The Kubernetes Aggregated Layer is enabled  -  OK

W0602 10:29:06.839593       7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor                                                                                                                                                                                                                       age.k8s.io/v1 CSIDriver
CSI Capabilities Check:
  Using CSI GroupVersion snapshot.storage.k8s.io/v1  -  OK

W0602 10:29:07.943947       7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor                                                                                                                                                                                                                       age.k8s.io/v1 CSIDriver
W0602 10:29:07.945922       7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor                                                                                                                                                                                                                       age.k8s.io/v1 CSIDriver
W0602 10:29:07.947792       7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor                                                                                                                                                                                                                       age.k8s.io/v1 CSIDriver
W0602 10:29:09.143837       7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor                                                                                                                                                                                                                       age.k8s.io/v1 CSIDriver
W0602 10:29:09.193755       7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor                                                                                                                                                                                                                       age.k8s.io/v1 CSIDriver
W0602 10:29:09.243422       7 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use stor                                                                                                                                                                                                                       age.k8s.io/v1 CSIDriver
Validating Provisioners:
rook-ceph.rbd.csi.ceph.com:
  Is a CSI Provisioner  -  OK
  Storage Classes:
    rook-ceph-block
      Valid Storage Class  -  OK
  Volume Snapshot Classes:
    csi-rbdplugin-snapclass
      Has k10.kasten.io/is-snapshot-class annotation set to true  -  OK
      Has deletionPolicy 'Delete'  -  OK
    k10-clone-csi-rbdplugin-snapclass

rook-ceph.cephfs.csi.ceph.com:
  Is a CSI Provisioner  -  OK
  Storage Classes:
    rook-ceph-fs
      Valid Storage Class  -  OK
  Volume Snapshot Classes:
    csi-cephfsplugin-snapclass
      Has k10.kasten.io/is-snapshot-class annotation set to true  -  OK
      Has deletionPolicy 'Delete'  -  OK

Validate Generic Volume Snapshot:
  Pod Created successfully  -  OK
  GVS Backup command executed successfully  -  OK
  Pod deleted successfully  -  OK

Thanks for your help in advance

Best answer by EBrockman

Hello @SBE

 

I am sorry, when looking for events you would have to grab them quickly after doing the failed task. Events only last for 1 hour by default. Also, this looks to be a storage issue if PVC are not able to be created. I would recommend taking a look into the csi-provisioner logs.

 

Thanks

Emmanuel 

View original
Did this topic help you find an answer to your question?

14 comments

  • Author
  • Not a newbie anymore
  • 4 comments
  • June 2, 2022

I accidently chose the wrong category. Can someone move this to the Kasten K10 Support category?

Thank you.


MicoolPaul
Forum|alt.badge.img+23

@Rick Vanover or @Madi.Cristil should be able to move the category to Kasten K10 Support for you! 🙂


Madi.Cristil
Forum|alt.badge.img+8
  • Community Manager
  • 616 comments
  • June 2, 2022

Thank you, @MicoolPaul !😊 

@SBE , that was moved to Kasten K10 Support! :)


Forum|alt.badge.img+1
  • Comes here often
  • 89 comments
  • June 2, 2022

Hello SBE,

 

Could you please provide the results of the command below when you are running the snapshot with export.

kubectl describe po copy-vol-data-xxxxx -n kasten-io

There should be a pod that is failing during the exporting.

 

Thanks

Emmanuel


  • Author
  • Not a newbie anymore
  • 4 comments
  • June 3, 2022

Hi Emmanuel,

 

thanks for your reply. The pod is in pending state because of the PVC:

 

Name:         copy-vol-data-h82qb
Namespace:    kasten-io
Priority:     0
Node:         <none>
Labels:       createdBy=kanister
Annotations:  <none>
Status:       Pending
IP:
IPs:          <none>
Containers:
  container:
    Image:      ghcr.io/kanisterio/kanister-tools:0.79.0
    Port:       <none>
    Host Port:  <none>
    Command:
      bash
      -c
      tail -f /dev/null
    Environment:  <none>
    Mounts:
      /mnt/vol_data/kanister-pvc-pjg5k from vol-kanister-pvc-pjg5k (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from k10-k10-token-r5x57 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  vol-kanister-pvc-pjg5k:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  kanister-pvc-pjg5k
    ReadOnly:   false
  k10-k10-token-r5x57:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  k10-k10-token-r5x57
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  2m19s  default-scheduler  0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling  2m19s  default-scheduler  0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.

 


  • Comes here often
  • 14 comments
  • June 7, 2022

I have the same issue - any ideas?


Forum|alt.badge.img+1
  • Comes here often
  • 89 comments
  • June 7, 2022

Hello @SBE

After looking over the described above, this looks to be related to a lack of resources available on all of your nodes to start the new copy-vol pod during exporting. You may need to increase resources of the nodes or add a node (or two) based on the node's resources.

 

This may help with understanding K10 requirements. “https://docs.kasten.io/latest/operating/footprint.html#requirement-guidelines

 

Thanks

Emmanuel

 

 


  • Not a newbie anymore
  • 1 comment
  • June 8, 2022

Hi @EBrockman ,

I am a colleague of SBE.
Looking at his Cluster I can see that lots of CPU/Memory is free.

kubectl top nodes

NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
m1         378m         9%     2902Mi          18%
m2         132m         3%     2240Mi          14%
m3         130m         3%     1916Mi          12%
w1         219m         2%     5092Mi          7%
w2         791m         9%     10812Mi         16%
w3         494m         6%     12732Mi         19%
w4         511m         6%     8555Mi          13%
w5         646m         8%     8676Mi          13%
w6         372m         4%     12725Mi         19%

The Pods do not specify how much they request, but each worker has 8CPU and 64GB memory.
Your link states that 4 Cores and 5GB Memory should be enough, that should fit easily on every single node of the cluster.
 


FRubens
Forum|alt.badge.img+2
  • Experienced User
  • 96 comments
  • June 9, 2022

Hello @JTI , @SBE .

Since it needs more investigation I would recommend please if you can open a trial case and upload your debug logs in the case so we can have a better look.

Also what you can try is create a PVC using the snapshot created as a source manually and see how much time it takes to be bound and ready.

Regards
Fernando R.


Forum|alt.badge.img+1
  • Comes here often
  • 89 comments
  • June 9, 2022

 Hello @JTI @SBE 

 

I am believing the issue is related to your Storage configuration. As you can see above in the PVC describe “waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator" this is a sign that the storage is not able to provision a PV for the PVC. You might want to check events in the kasten-io namespace, to verify on your storage side why the PV are not being created.

 

Thanks

Emmanuel


  • Author
  • Not a newbie anymore
  • 4 comments
  • June 22, 2022

Hello, 

sorry for the late response, I was on vacation.

@EBrockman I checked the events, but there are no further informations. 

@FRubens I opened a case and uploaded the debug logs.


  • Author
  • Not a newbie anymore
  • 4 comments
  • June 24, 2022

Hello @FRubens 

I figured out the problem:

The automatically created PVC is not correct, but I’m not sure whos fault this is. The APIGroup is not correct. It is missing the /v1. 

Here is a describe from the PVC, which is getting created automatically and is not working as intended.

Name:          kanister-pvc-28c52
Namespace:     kasten-io
StorageClass:  rook-ceph-block
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
DataSource:
  APIGroup:  snapshot.storage.k8s.io
  Kind:      VolumeSnapshot
  Name:      snapshot-copy-7z29k2r6
Used By:     copy-vol-data-sk9qs
Events:
  Type     Reason                Age                   From                                                                                                       Message
  ----     ------                ----                  ----                                                                                                       -------
  Normal   ExternalProvisioning  3m57s (x26 over 10m)  persistentvolume-controller                                                                                waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          100s (x11 over 10m)   rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d8bcc5fc4-99hw5_b2d92ce7-3ff4-4707-a5dd-f1ab0072e35e  External provisioner is provisioning volume for claim "kasten-io/kanister-pvc-28c52"
  Warning  ProvisioningFailed    100s (x11 over 10m)   rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d8bcc5fc4-99hw5_b2d92ce7-3ff4-4707-a5dd-f1ab0072e35e  failed to provision volume with StorageClass "rook-ceph-block": error getting handle for DataSource Type VolumeSnapshot by Name snapshot-copy-7z29k2r6: error getting snapshot snapshot-copy-7z29k2r6 from api server: the server could not find the requested resource (get volumesnapshots.snapshot.storage.k8s.io snapshot-copy-7z29k2r6)

If I create the PVC manually with APIGroup set to snapshot.storage.k8s.io/v1 like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: kanister-pvc
  namespace: kasten-io
spec:
  accessModes:
  - ReadWriteOnce
  dataSource:
    apiGroup: snapshot.storage.k8s.io/v1
    kind: VolumeSnapshot
    name: snapshot-copy-7z29k2r6
  resources:
    requests:
      storage: "10737418240"
  storageClassName: rook-ceph-block
  volumeMode: Filesystem

the provisioning succeedes.


Forum|alt.badge.img+1
  • Comes here often
  • 89 comments
  • Answer
  • June 24, 2022

Hello @SBE

 

I am sorry, when looking for events you would have to grab them quickly after doing the failed task. Events only last for 1 hour by default. Also, this looks to be a storage issue if PVC are not able to be created. I would recommend taking a look into the csi-provisioner logs.

 

Thanks

Emmanuel 


@SBE nice work! I’ve stumbled on the same missing v1 issue. How did you go about fixing it?