Poor SAN transport mode performance


Userlevel 7
Badge +6
  • Veeam Legend, Veeam Vanguard
  • 685 comments

These days I had to troubleshoot poor VMware vSphere SAN transport mode performance during backup with Veeam. Before SAN mode, NDB transport mode was used. NBD outperformed SAN mode totally!

Here are some facts about the environment.

Environment

  • VMFS volume are hosted by two HPE 3PAR arrays in a synchronous replication configuration (Remote Copy) with Peer Persistence.
  • ESXi hosts are running on different versions.
  • Backup (here: Veeam Backup and Replication) is installed on a dedicated physical host. This host is connected with 10G LAN and a single FC uplink to one out of two fabrics.

Backup performance

The used backup mode in this environment was NDB transport mode. This was because the physical host wasn’t equipped with a FC HBA. To improve backup performance a HBA was installed. To run direct SAN backup, a few configuration steps are necessary:

  • Add MPIO feature to backup server,
  • Configure MPIO for 3PAR
    • by running: mpclaim -r -I -d "3PARdataVV"
  • FC switch zoning,
  • Exporting all LUNs to backup host,
  • Rebooting the server.

After that we saw a miserable poor SAN transport mode performance:

  • Observed Backup throughput for different VMs from < 1MB/sec (!) to 60MB/sec.
  • Backup job lasts about 10 minutes before the first byte was transferred.

Troubleshooting

  • No errors on any layer (Windows host, FC Switches, 3PAR arrays).
  • All SFP metrics on FC switches were OK.
  • Arrays did not have any other inexplicable performance issues.
  • Antivirus software was disabled during testing.

Solution

Fortunately the solutions is quite simple. The Microsoft MPIO feature Path Verification has to be enabled. This can be done in two ways:

  • Per Disk
    • Open Disk Management –> open properties of a disk –> Select tab MPIO –> press Details –> enable Path Verify Enabled –> Press OK.
    • This has to be done for each 3PAR Disk. The Advantage here is, this can be done online.
  • Globally
    • Run PowerShell command:
      Set-MPIOSetting -NewPathVerificationState Enabled
    • To check current settings, run:
      Get-MPIOSetting
    • After a reboot, all disks has feature enabled.

Notes

  • This issue is not related to Veeam Backup & Replication. Each backup solution that uses SAN mode can suffer from bad performance here.
  • With Path Verification enabled, we could increased backup throughput for a SSD located VM from about 60 MB/sec to 600MB/sec.
  • In HPE 3PAR Windows Server2016/2012/2008 Implementation Guide you see, path verification has to be set in a Peer Persistence environment. Fact is, that I did not need this up to now for Windows backup hosts. In such a situation Windows just reads from 3PAR volumes. But it is really necessary, when Windows writes to 3PAR Peer Persistence LUNs.
  • In my opinion this could happen at Primera arrays too.
  • To create the right 3PAR/Primera Peer Persistence claiming rule for ESXi hosts see here.

10 comments

Userlevel 3
Badge

x10 faster!! It’s huge. Thanks for sharing.

Userlevel 7
Badge +3

If you compare incremental job performance, which average MB/s did you see on the 10 GbE NBD vs the new Direct SAN? Which FC link speed do you have?

Userlevel 7
Badge +4

Very Useful!!

Userlevel 7
Badge +4

If you compare incremental job performance, which average MB/s did you see on the 10 GbE NBD vs the new Direct SAN? Which FC link speed do you have?

Good Question Rasmus … In my opinion is something that should always be compared… 

I also believe that using LACP in vSphere (vDS only) you can take even more advantage of NBD ...

Userlevel 7
Badge +3

Generally, there should be less overhead on NBD (10+ GbE), but of course if this is 32 Gbit/s FC, then perhaps it will be faster away :)

Userlevel 7
Badge +6

If you compare incremental job performance, which average MB/s did you see on the 10 GbE NBD vs the new Direct SAN? Which FC link speed do you have?

Good Question Rasmus … In my opinion is something that should always be compared… 

I also believe that using LACP in vSphere (vDS only) you can take even more advantage of NBD ...

 

Good point! For testing I primary used full backups to compare throughput. Because of the comparative small data size that have to be transferred for incremental backups, a much faster network does not save that much time compared to fulls. Here FC linked with 8Gbit.

The thing with NBD is, that ESXi limits NBD-traffic in some kind. For a single VMDK-backup (full or incremental) I personally did not see a throughput above 200MB/sec - most often much less. I do not believe LACP could increase this limit. Here SAN mode outperforms NBD: Even a single VMDK can be backuped with much higher rates like 1GB/sec and more.

In this scenario we could backup a specific VMDK with about 100MB/sec using NBD and with about 600MB/sec using SAN mode.

Userlevel 7
Badge +6

Generally, there should be less overhead on NBD (10+ GbE), but of course if this is 32 Gbit/s FC, then perhaps it will be faster away :)

Because NBD limits the throughput for a single task/session, you need to parallel your backup jobs accordingly. Here SAN mode is much more flexible to utilize the provided bandwidth.

But another aspect is the time it takes a backup job transfers data to the repository. Here NBD is much quicker than SAN and Hot-Add mode. Because for the latter two there is a lot of VMDK/LUN mapping to do. This takes time. So when you backup a lot of small VMs, NBD could be faster (from execution time perspective) than SAN/Hot-Add.

Userlevel 7
Badge +3

200 MB/s sounds very low for 10 GbE. Are you sure the network is operating correctly?

Userlevel 7
Badge +6

200 MB/s sounds very low for 10 GbE. Are you sure the network is operating correctly?

Yes, network works fine.

You have to run more backup in parallel, than you can utilize more bandwidth. 200 MB/sec is the speed for one running backup job that backups one VMDK. This is not the normal way backup is configured :slight_smile:

 

Userlevel 4
Badge

This is huge :sunglasses:

Comment