Poor SAN transport mode performance


Userlevel 7
Badge +13

These days I had to troubleshoot poor VMware vSphere SAN transport mode performance during backup with Veeam. Before SAN mode, NDB transport mode was used. NBD outperformed SAN mode totally!

Here are some facts about the environment.

Environment

  • VMFS volume are hosted by two HPE 3PAR arrays in a synchronous replication configuration (Remote Copy) with Peer Persistence.
  • ESXi hosts are running on different versions.
  • Backup (here: Veeam Backup and Replication) is installed on a dedicated physical host. This host is connected with 10G LAN and a single FC uplink to one out of two fabrics.

Backup performance

The used backup mode in this environment was NDB transport mode. This was because the physical host wasn’t equipped with a FC HBA. To improve backup performance a HBA was installed. To run direct SAN backup, a few configuration steps are necessary:

  • Add MPIO feature to backup server,
  • Configure MPIO for 3PAR
    • by running: mpclaim -r -I -d "3PARdataVV"
  • FC switch zoning,
  • Exporting all LUNs to backup host,
  • Rebooting the server.

After that we saw a miserable poor SAN transport mode performance:

  • Observed Backup throughput for different VMs from < 1MB/sec (!) to 60MB/sec.
  • Backup job lasts about 10 minutes before the first byte was transferred.

Troubleshooting

  • No errors on any layer (Windows host, FC Switches, 3PAR arrays).
  • All SFP metrics on FC switches were OK.
  • Arrays did not have any other inexplicable performance issues.
  • Antivirus software was disabled during testing.

Solution

Fortunately the solutions is quite simple. The Microsoft MPIO feature Path Verification has to be enabled. This can be done in two ways:

  • Per Disk
    • Open Disk Management –> open properties of a disk –> Select tab MPIO –> press Details –> enable Path Verify Enabled –> Press OK.
    • This has to be done for each 3PAR Disk. The Advantage here is, this can be done online.
  • Globally
    • Run PowerShell command:
      Set-MPIOSetting -NewPathVerificationState Enabled
    • To check current settings, run:
      Get-MPIOSetting
    • After a reboot, all disks has feature enabled.

Notes

  • This issue is not related to Veeam Backup & Replication. Each backup solution that uses SAN mode can suffer from bad performance here.
  • With Path Verification enabled, we could increased backup throughput for a SSD located VM from about 60 MB/sec to 600MB/sec.
  • In HPE 3PAR Windows Server2016/2012/2008 Implementation Guide you see, path verification has to be set in a Peer Persistence environment. Fact is, that I did not need this up to now for Windows backup hosts. In such a situation Windows just reads from 3PAR volumes. But it is really necessary, when Windows writes to 3PAR Peer Persistence LUNs.
  • In my opinion this could happen at Primera arrays too.
  • To create the right 3PAR/Primera Peer Persistence claiming rule for ESXi hosts see here.

17 comments

Userlevel 6
Badge +4

x10 faster!! It’s huge. Thanks for sharing.

Userlevel 7
Badge +11

If you compare incremental job performance, which average MB/s did you see on the 10 GbE NBD vs the new Direct SAN? Which FC link speed do you have?

Userlevel 7
Badge +6

Very Useful!!

Userlevel 7
Badge +6

If you compare incremental job performance, which average MB/s did you see on the 10 GbE NBD vs the new Direct SAN? Which FC link speed do you have?

Good Question Rasmus … In my opinion is something that should always be compared… 

I also believe that using LACP in vSphere (vDS only) you can take even more advantage of NBD ...

Userlevel 7
Badge +11

Generally, there should be less overhead on NBD (10+ GbE), but of course if this is 32 Gbit/s FC, then perhaps it will be faster away :)

Userlevel 7
Badge +13

If you compare incremental job performance, which average MB/s did you see on the 10 GbE NBD vs the new Direct SAN? Which FC link speed do you have?

Good Question Rasmus … In my opinion is something that should always be compared… 

I also believe that using LACP in vSphere (vDS only) you can take even more advantage of NBD ...

 

Good point! For testing I primary used full backups to compare throughput. Because of the comparative small data size that have to be transferred for incremental backups, a much faster network does not save that much time compared to fulls. Here FC linked with 8Gbit.

The thing with NBD is, that ESXi limits NBD-traffic in some kind. For a single VMDK-backup (full or incremental) I personally did not see a throughput above 200MB/sec - most often much less. I do not believe LACP could increase this limit. Here SAN mode outperforms NBD: Even a single VMDK can be backuped with much higher rates like 1GB/sec and more.

In this scenario we could backup a specific VMDK with about 100MB/sec using NBD and with about 600MB/sec using SAN mode.

Userlevel 7
Badge +13

Generally, there should be less overhead on NBD (10+ GbE), but of course if this is 32 Gbit/s FC, then perhaps it will be faster away :)

Because NBD limits the throughput for a single task/session, you need to parallel your backup jobs accordingly. Here SAN mode is much more flexible to utilize the provided bandwidth.

But another aspect is the time it takes a backup job transfers data to the repository. Here NBD is much quicker than SAN and Hot-Add mode. Because for the latter two there is a lot of VMDK/LUN mapping to do. This takes time. So when you backup a lot of small VMs, NBD could be faster (from execution time perspective) than SAN/Hot-Add.

Userlevel 7
Badge +11

200 MB/s sounds very low for 10 GbE. Are you sure the network is operating correctly?

Userlevel 7
Badge +13

200 MB/s sounds very low for 10 GbE. Are you sure the network is operating correctly?

Yes, network works fine.

You have to run more backup in parallel, than you can utilize more bandwidth. 200 MB/sec is the speed for one running backup job that backups one VMDK. This is not the normal way backup is configured :slight_smile:

 

Userlevel 5
Badge +2

This is huge :sunglasses:

Userlevel 2

I have similar problem, we have average backup disk speed around 30mb to 160mb only using alletra Storage snapshot backup. does this solution also work with storage snapshot backup?

Userlevel 7
Badge +12

@kelvin koh At least it's worth a try. Please let us know if it did change something.

Userlevel 2

@kelvin kohAt least it's worth a try. Please let us know if it did change something.

Hi Regnor

thank for your comment, i believe enable this “NewPathVerificationState” will not cause any harm to the VMware datastore right? i will need to use power shell command to enable all as we have many Datastore LUNs presented to the proxy server ( it show offline and automount is disabled  right now).

thank you,

kelvin 

Userlevel 7
Badge +8

Generally, there should be less overhead on NBD (10+ GbE), but of course if this is 32 Gbit/s FC, then perhaps it will be faster away :)

Because NBD limits the throughput for a single task/session, you need to parallel your backup jobs accordingly. Here SAN mode is much more flexible to utilize the provided bandwidth.

But another aspect is the time it takes a backup job transfers data to the repository. Here NBD is much quicker than SAN and Hot-Add mode. Because for the latter two there is a lot of VMDK/LUN mapping to do. This takes time. So when you backup a lot of small VMs, NBD could be faster (from execution time perspective) than SAN/Hot-Add.

 

I agree with this. I have quite a few LUNS/Volumes. On a large job, with lots of data DirectSAN is very fast. If there are a few small VM’s with minimal incremental data, it’s faster to NBD as the storage snapshots or hot-add does take some time to setup and then remove when it’s done.  At the end of the day though, it’s minimal and only on very small jobs anyways. 99.9% of the time I’d use Direct SAN unless there is a specific requirement not to. 

Userlevel 7
Badge +7

These days I had to troubleshoot poor VMware vSphere SAN transport mode performance during backup with Veeam. Before SAN mode, NDB transport mode was used. NBD outperformed SAN mode totally!

Here are some facts about the environment.

Environment

  • VMFS volume are hosted by two HPE 3PAR arrays in a synchronous replication configuration (Remote Copy) with Peer Persistence.
  • ESXi hosts are running on different versions.
  • Backup (here: Veeam Backup and Replication) is installed on a dedicated physical host. This host is connected with 10G LAN and a single FC uplink to one out of two fabrics.

Backup performance

The used backup mode in this environment was NDB transport mode. This was because the physical host wasn’t equipped with a FC HBA. To improve backup performance a HBA was installed. To run direct SAN backup, a few configuration steps are necessary:

  • Add MPIO feature to backup server,
  • Configure MPIO for 3PAR
    • by running: mpclaim -r -I -d "3PARdataVV"
  • FC switch zoning,
  • Exporting all LUNs to backup host,
  • Rebooting the server.

After that we saw a miserable poor SAN transport mode performance:

  • Observed Backup throughput for different VMs from < 1MB/sec (!) to 60MB/sec.
  • Backup job lasts about 10 minutes before the first byte was transferred.

Troubleshooting

  • No errors on any layer (Windows host, FC Switches, 3PAR arrays).
  • All SFP metrics on FC switches were OK.
  • Arrays did not have any other inexplicable performance issues.
  • Antivirus software was disabled during testing.

Solution

Fortunately the solutions is quite simple. The Microsoft MPIO feature Path Verification has to be enabled. This can be done in two ways:

  • Per Disk
    • Open Disk Management –> open properties of a disk –> Select tab MPIO –> press Details –> enable Path Verify Enabled –> Press OK.
    • This has to be done for each 3PAR Disk. The Advantage here is, this can be done online.
  • Globally
    • Run PowerShell command:
      Set-MPIOSetting -NewPathVerificationState Enabled
    • To check current settings, run:
      Get-MPIOSetting
    • After a reboot, all disks has feature enabled.

Notes

  • This issue is not related to Veeam Backup & Replication. Each backup solution that uses SAN mode can suffer from bad performance here.
  • With Path Verification enabled, we could increased backup throughput for a SSD located VM from about 60 MB/sec to 600MB/sec.
  • In HPE 3PAR Windows Server2016/2012/2008 Implementation Guide you see, path verification has to be set in a Peer Persistence environment. Fact is, that I did not need this up to now for Windows backup hosts. In such a situation Windows just reads from 3PAR volumes. But it is really necessary, when Windows writes to 3PAR Peer Persistence LUNs.
  • In my opinion this could happen at Primera arrays too.
  • To create the right 3PAR/Primera Peer Persistence claiming rule for ESXi hosts see here.

@vNote42 Your information is useful for me. I will share it to my team. Thank you.

Userlevel 1

Similar problem in our environment, thank you for this post and solution

Userlevel 7
Badge +12

@kelvin kohAt least it's worth a try. Please let us know if it did change something.

Hi Regnor

thank for your comment, i believe enable this “NewPathVerificationState” will not cause any harm to the VMware datastore right? i will need to use power shell command to enable all as we have many Datastore LUNs presented to the proxy server ( it show offline and automount is disabled  right now).

thank you,

kelvin 

@kelvin koh It only affects how the backup proxy or it’s FC HBA accesses the datastores. But your VMware environment won’t notice any difference. Just like you’ve written, it’s important that the stay offline and dismounted, or else Windows will overwrite the VMware filesystem.

Comment