Skip to main content

Hi there,

 

We would like to backup some large PVC (+ 5To) running with vSAN CSI.

Snapshot running great but export in block mode run during 10 hours and stop at the end due to timeout limit.

Our backup target is a MInio service hosted on Synology NAS with 10Gbps link and SSD caching. Performance tests are OK.

We see some traffic on network card about 40Mo/s first time running export and then about 1-3Mo/s only for the other tries.

“k10kopia” process only use about 30% CPU process during export.

 

I already try to increase IOPS limits on vSAN profile without success.

 

At this moment we can only backup lowest PVC size (500Mo to 10Go) without any issue.

I can’t see any logs about k10kopia so i’m totally lost :(

 

Thank you for your help !

 

@jaiganeshjk 


Welcome back,

 

To illustrate this behavior, some screenshots :

 

  1. A specific policy for our large application backup

In a nutshell, this policy only backup k8s manifests and 3 Artifactory PVC (PostgreSQL, RabbitMQ and… all artifacts data.)

 

  1. block-mode-upload run… I don’t know what it does but it run ;)

 

  1. Process is alive on host but with a very very low CPU usage (0% when i wrote this issue)

 

And of course, others policy work like a charm. Here an example with other tools (Gitlab, Keycloak etc… on one policy)

The only difference is PVC size. Same backend, same vSAN Datastore, same vSAN policy, same S3 target.

 

 


Welcome back again,

 

Here a log extract from block-mode-upload pod, file ‘/tmp/vmware-root/vixDiskLib-42.log’ :

 

tail: cannot open ''$'\303''-f' for reading: No such file or directory
==> /tmp/vmware-root/vixDiskLib-42.log <==
2023-04-19T10:59:52.840Z Wa(03) host-70 sNFC ERROR]NfcAioLogFatalSessionErrorLocked: A fatal session error occurred. The error was: 'NFC_NETWORK_ERROR' (3)
2023-04-19T10:59:52.840Z Wa(03) host-70 sNFC ERROR]NfcAioGetMessage: Recv msg failed: NFC_NETWORK_ERROR
2023-04-19T10:59:52.840Z Wa(03) host-70 sNFC ERROR]NfcAioClientReceiveLoop: Failed to receive an AIO message: NFC_NETWORK_ERROR
2023-04-19T10:59:52.840Z In(05) host-70 DISKLIB-LIB   : RWv failed ioId: #259853 (290) (34) .
2023-04-19T10:59:52.840Z Wa(03) host-74 hNFC ERROR]NfcAio_TimedWait: The session is in a faulted state: NFC_NETWORK_ERROR
2023-04-19T10:59:52.840Z Wa(03) host-72 hNFC ERROR]NfcAio_TimedWait: The session is in a faulted state: NFC_NETWORK_ERROR
2023-04-19T10:59:52.841Z Wa(03) host-44 hNFC ERROR]NfcAio_TimedWait: The session is in a faulted state: NFC_NETWORK_ERROR
2023-04-19T10:59:52.841Z Wa(03) host-73 hNFC ERROR]NfcAio_TimedWait: The session is in a faulted state: NFC_NETWORK_ERROR
2023-04-19T10:59:52.841Z Er(02) host-73 VixDiskLib: Detected DiskLib error 290 (NBD_ERR_GENERIC).
2023-04-19T10:59:52.841Z Er(02) host-73 VixDiskLib: VixDiskLib_Read: Read 2048 sectors at 555094016 failed. Error 1 (Unknown error) (DiskLib error 290: NBD_ERR_GENERIC) at 7858.

 

What do you think about that ? PVC seems working well.

 

Thank you,

Have a nice day


@Florian Lacrampe  please share the version of VDDK you are using.
do you have any errors in VDDK logs?

 


Hi @Hagag,

 

Thank you for your answer !

Ok, we think we pointed out the issue…

We use NSX-T as network backend and we think there is a connection hang up between our k8s cluster and vSAN backend.

 

We’ll try to migrate our Artifactory outside this network to test again.

 

 

Thank you for your help !


Comment