Solved

Large Backup +5To with Kasten and vSAN CSI


Userlevel 2

Hi there,

 

We would like to backup some large PVC (+ 5To) running with vSAN CSI.

Snapshot running great but export in block mode run during 10 hours and stop at the end due to timeout limit.

Our backup target is a MInio service hosted on Synology NAS with 10Gbps link and SSD caching. Performance tests are OK.

We see some traffic on network card about 40Mo/s first time running export and then about 1-3Mo/s only for the other tries.

“k10kopia” process only use about 30% CPU process during export.

 

I already try to increase IOPS limits on vSAN profile without success.

 

At this moment we can only backup lowest PVC size (500Mo to 10Go) without any issue.

I can’t see any logs about k10kopia so i’m totally lost :(

 

Thank you for your help !

 

icon

Best answer by Florian Lacrampe 20 April 2023, 10:56

View original

5 comments

Userlevel 7
Badge +20

@jaiganeshjk 

Userlevel 2

Welcome back,

 

To illustrate this behavior, some screenshots :

 

  1. A specific policy for our large application backup

In a nutshell, this policy only backup k8s manifests and 3 Artifactory PVC (PostgreSQL, RabbitMQ and… all artifacts data.)

 

  1. block-mode-upload run… I don’t know what it does but it run ;)

 

  1. Process is alive on host but with a very very low CPU usage (0% when i wrote this issue)

 

And of course, others policy work like a charm. Here an example with other tools (Gitlab, Keycloak etc… on one policy)

The only difference is PVC size. Same backend, same vSAN Datastore, same vSAN policy, same S3 target.

 

 

Userlevel 2

Welcome back again,

 

Here a log extract from block-mode-upload pod, file ‘/tmp/vmware-root/vixDiskLib-42.log’ :

 

tail: cannot open ''$'\303''-f' for reading: No such file or directory
==> /tmp/vmware-root/vixDiskLib-42.log <==
2023-04-19T10:59:52.840Z Wa(03) host-70 [NFC ERROR]NfcAioLogFatalSessionErrorLocked: A fatal session error occurred. The error was: 'NFC_NETWORK_ERROR' (3)
2023-04-19T10:59:52.840Z Wa(03) host-70 [NFC ERROR]NfcAioGetMessage: Recv msg failed: NFC_NETWORK_ERROR
2023-04-19T10:59:52.840Z Wa(03) host-70 [NFC ERROR]NfcAioClientReceiveLoop: Failed to receive an AIO message: NFC_NETWORK_ERROR
2023-04-19T10:59:52.840Z In(05) host-70 DISKLIB-LIB   : RWv failed ioId: #259853 (290) (34) .
2023-04-19T10:59:52.840Z Wa(03) host-74 [NFC ERROR]NfcAio_TimedWait: The session is in a faulted state: NFC_NETWORK_ERROR
2023-04-19T10:59:52.840Z Wa(03) host-72 [NFC ERROR]NfcAio_TimedWait: The session is in a faulted state: NFC_NETWORK_ERROR
2023-04-19T10:59:52.841Z Wa(03) host-44 [NFC ERROR]NfcAio_TimedWait: The session is in a faulted state: NFC_NETWORK_ERROR
2023-04-19T10:59:52.841Z Wa(03) host-73 [NFC ERROR]NfcAio_TimedWait: The session is in a faulted state: NFC_NETWORK_ERROR
2023-04-19T10:59:52.841Z Er(02) host-73 VixDiskLib: Detected DiskLib error 290 (NBD_ERR_GENERIC).
2023-04-19T10:59:52.841Z Er(02) host-73 VixDiskLib: VixDiskLib_Read: Read 2048 sectors at 555094016 failed. Error 1 (Unknown error) (DiskLib error 290: NBD_ERR_GENERIC) at 7858.

 

What do you think about that ? PVC seems working well.

 

Thank you,

Have a nice day

Userlevel 5
Badge +2

@Florian Lacrampe  please share the version of VDDK you are using.
do you have any errors in VDDK logs?

 

Userlevel 2

Hi @Hagag,

 

Thank you for your answer !

Ok, we think we pointed out the issue…

We use NSX-T as network backend and we think there is a connection hang up between our k8s cluster and vSAN backend.

 

We’ll try to migrate our Artifactory outside this network to test again.

 

 

Thank you for your help !

Comment