Question

Unable to export snapshot when pvc are in ReadWriteMany accessmode


Userlevel 3

Hi,

i’m running a RKE2 cluster with portworx storage and 37 deployment

i have a policy to backup deployment everyday and export backups to a S3 generic storage.

Prior september all was runninig perfectly.

In september some exports was failing (2/37)

Last week i have decided to migrate my pvc who was using native provider to CSI.

Now all my deployment with csi storage class have export failed

Old deployments not migrated are running as well

i have try to rollback pvc migrated to native and … same problem: export failed.

So, i have made some other tests :

  • deploy an application with a readwriteonce volume (native) > export OK
  • deploy an application with a readwriteonce volume (csi) > export OK
  • deploy an application with a readwritemany volume (native) > export Failed
  • deploy an application with a readwritemany volume (csi) > export Failed

 

 

- cause:
cause:
cause:
cause:
cause:
cause:
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:319
function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func1
linenumber: 319
message: Failed to get snapshot ID from create snapshot output
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:129
function: kasten.io/k10/kio/kanister/function.CopyVolumeData
linenumber: 129
message: Failed to execute copy volume data pod function
file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:1635
function: kasten.io/k10/kio/exec/phases/phase.(*gvcConverter).Convert
linenumber: 1635
message: Error creating portable snapshot
fields:
- name: type
value: CSI
- name: id
value: k10-csi-snap-5lkfmkllzwhkxvrb
file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:442
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).convertSnapshots.func1
linenumber: 442
message: Failed to export snapshot data
file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:210
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).copy
linenumber: 210
message: Error converting snapshots
file: kasten.io/k10/kio/exec/phases/phase/export.go:168
function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
linenumber: 168
message: Failed to copy artifacts
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
cause:
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:319
function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func1
linenumber: 319
message: Failed to get snapshot ID from create snapshot output
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:129
function: kasten.io/k10/kio/kanister/function.CopyVolumeData
linenumber: 129
message: Failed to execute copy volume data pod function
file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:1635
function: kasten.io/k10/kio/exec/phases/phase.(*gvcConverter).Convert
linenumber: 1635
message: Error creating portable snapshot
fields:
- name: type
value: CSI
- name: id
value: k10-csi-snap-5lkfmkllzwhkxvrb
file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:442
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).convertSnapshots.func1
linenumber: 442
message: Failed to export snapshot data
file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:210
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).copy
linenumber: 210
message: Error converting snapshots
file: kasten.io/k10/kio/exec/phases/phase/export.go:168
function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
linenumber: 168
message: Failed to copy artifacts
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
cause:
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:319
function: kasten.io/k10/kio/kanister/function.CopyVolumeData.copyVolumeDataPodExecFunc.func1
linenumber: 319
message: Failed to get snapshot ID from create snapshot output
file: kasten.io/k10/kio/kanister/function/kio_copy_volume_data.go:129
function: kasten.io/k10/kio/kanister/function.CopyVolumeData
linenumber: 129
message: Failed to execute copy volume data pod function
file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:1635
function: kasten.io/k10/kio/exec/phases/phase.(*gvcConverter).Convert
linenumber: 1635
message: Error creating portable snapshot
fields:
- name: type
value: CSI
- name: id
value: k10-csi-snap-5lkfmkllzwhkxvrb
file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:442
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).convertSnapshots.func1
linenumber: 442
message: Failed to export snapshot data
file: kasten.io/k10/kio/exec/phases/phase/copy_snapshots.go:210
function: kasten.io/k10/kio/exec/phases/phase.(*ArtifactCopier).copy
linenumber: 210
message: Error converting snapshots
file: kasten.io/k10/kio/exec/phases/phase/export.go:168
function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
linenumber: 168
message: Failed to copy artifacts
message: Job failed to be executed

 


11 comments

Userlevel 7
Badge +20

@safiya @Madi.Cristil - please move to Kasten K10 discussion board for better help.

Userlevel 7
Badge +7

@jaiganeshjk 

Userlevel 6
Badge +2

@Vecteur IT Thanks for posting your question here.

Unfortunately, The log messages doesn’t show much on what is going on in the backend.

Would you mind opening a case with us through `my.veeam.com` and use `Kasten by veeam K10 Trial` in products while opening a case. 

Please collect the debug logs (https://docs.kasten.io/latest/operating/support.html#gathering-debugging-information) and upload the same to the case. We will get in touch and take a deep look at what’s going on. 

Userlevel 3

@Vecteur IT Thanks for posting your question here.

Unfortunately, The log messages doesn’t show much on what is going on in the backend.

Would you mind opening a case with us through `my.veeam.com` and use `Kasten by veeam K10 Trial` in products while opening a case. 

Please collect the debug logs (https://docs.kasten.io/latest/operating/support.html#gathering-debugging-information) and upload the same to the case. We will get in touch and take a deep look at what’s going on. 

 

Hi jaiganeshjk

Thank you for your help.

I have create Case #06321679 and join debug log.

 

Userlevel 3

Hi,

In kubernetes event i have thi error after copy-vol-data-8nnj5 is created :

12m         Warning   VolumeFailedDelete   persistentvolume/vol-e0db55a6-5dd0-11ee-bc68-e20363dc92a1   rpc error: code = Internal desc = Failed to delete volume 555257514601042816: rpc error: code = Internal desc = Failed to detach volume 555257514601042816: Volume 555257514601042816 is mounted at 1 location(s): /var/lib/osd/pxns/555257514601042816

 

Userlevel 6
Badge +2

@satish.kumar FYI ^^

 

Userlevel 3

Hi, i have now a script to capture copy-vol-data.

the TTL cor copy-col is about 5 second and for the pvc its the same

join describe for copy-vol-data-9rnkg and kanister-pvc-qs9nk:

Name:             copy-vol-data-9rnkg
Namespace: kasten-io
Priority: 0
Service Account: k10-k10
Node: <none>
Labels: createdBy=kanister
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
container:
Image: ghcr.io/kanisterio/kanister-tools:0.96.0
Port: <none>
Host Port: <none>
Command:
bash
-c
tail -f /dev/null
Environment: <none>
Mounts:
/mnt/vol_data/kanister-pvc from vol-4954d9c6-5e09-11ee-bc68-e20363dc92a1 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tljw8 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
vol-4954d9c6-5e09-11ee-bc68-e20363dc92a1:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kanister-pvc-qs9nk
ReadOnly: false
kube-api-access-tljw8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 0s stork 0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 No preemption victims found for incoming pod..

 

The pvc:

Name:          kanister-pvc-qs9nk
Namespace: kasten-io
StorageClass: px-csi-cms
Status: Bound
Volume: pvc-c1365038-fafe-4d58-977d-ce58b4aa7f22
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: pxd.portworx.com
volume.kubernetes.io/storage-provisioner: pxd.portworx.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1Gi
Access Modes: RWO
VolumeMode: Filesystem
DataSource:
APIGroup: snapshot.storage.k8s.io
Kind: VolumeSnapshot
Name: snapshot-copy-zzknt9qw
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 5s (x2 over 5s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "pxd.portworx.com" or manually created by system administrator
Normal Provisioning 5s pxd.portworx.com_px-csi-ext-fc94fdf48-sc2xm_55665d21-835b-45a1-9031-f104ffdf27d6 External provisioner is provisioning volume for claim "kasten-io/kanister-pvc-qs9nk"
Normal ProvisioningSucceeded 4s pxd.portworx.com_px-csi-ext-fc94fdf48-sc2xm_55665d21-835b-45a1-9031-f104ffdf27d6 Successfully provisioned volume pvc-c1365038-fafe-4d58-977d-ce58b4aa7f22

Is it normal that DataSource is empty ??

I have the same issue after Uprade from 5.5.11 to 6.0.9.

We habe RKE1 and Storage is ceph-csi-rbd

Userlevel 3

I have a response from support:

Apparentl it seems redwritemany volumes in Portworx means the underlying filesystems is sharedv4 (NFS )

Regrettably here is an issue ( we are working on the fix very soon) reading from NFS with rootless.

 

The solution will be available in the near future, so please continue to monitor the K10 release notes and upgrade your K10 once the fix becomes available.

Thx for your replay

Sorry I see yet you have the issue only with RWX. We have Ceph and the volume is RWO and have the same issue

Userlevel 3

Maybe an other rootless problem...

Comment