Solved

Virtual Machine restoration is very slow


Hi, I am currently restoring a VM to a ESXi host (original host) and I am already on my 8th day of the restoration. Which is very very very slow. The VM has a size of 2TB and as of this writing the “statistics” shows that I have still 900+ GB left with the restoration rate at 2MB/s at 56% in progress state while the “log” shows “Restoring Hard disk 1 (2.0 TB): 188.6GB restored at 349KB/s [nbd] 157:xx:xx

 

This is a Full VM restoration job and the file is located on an External HDD. This is the only job running at the moment.

 

Restoration progress:

Day 1 - 0% 

Day 2 - 1%

Day 3 - 28%

Day 4 - 51%

Day 5 - 53%

Day 6 - 54%

Day 7 - 55%

Day 8 - 56%

 

The Veeam VM configuration:

vCPU = 4 CPU

Memory = 4GB

USB = 3.0

Ethernet Interface  = VMXNET3 running at 10Gbps

Repository Configuration  = unchecked “Limit read and write data rates”

Throttling = 500Mbps

License = Community Edition

 

Veeam VM Utilization:

CPU = around 8% average

Memory = at 80% constant

Ethernet = only at 2-5Mbps average on send, 112Kbps average on receive

 

Does anyone have a clue what’s going on here?

icon

Best answer by Michael Melter 18 October 2021, 19:06

View original

23 comments

Userlevel 7
Badge +20

How is the HDD connected?  If USB and since external the restore will be slow compared to other storage.

This is connected on the ESXi host via USB (3.0 configuration) where the Veeam VM is located. And the file is being restored to another ESXi host on the same subnet. But that kind of slow? And this would take probably a month for the restoration to be completed.

Userlevel 7
Badge +20

So there is the hardware layer then the esxi layer before getting to storage to restore. Being even usb 3 this will take time being a 2TB VM.  I think waiting is probably the best thing unfortunately.

Userlevel 6
Badge +3

The engine is definitely capable of more, so I guess this is some HW/config limit and I assume that’s about the virtual USB performance - I think that is not really the most optimized vSphere use case.

Today you can probably check the performance stats of the external drive. Is it 100% busy?

I assume you need this restore done, so I’d also wait for it as Chris said, but probably afterwards you can test some things and then consider this in case of other restores:

  • How fast can you copy files from the external drive to your VM?
  • Take the external drive and connect natively to a physical machine (e.g. your laptop/workstation) - check the read speed there.
  • If the latter is much faster you could install CE on a physical box for restores: import the backups from the external drive and try to restore from there and verify if the speed is better.

When you identify the external HDD as the culprit you can consider going for a low end 1-bay/2-bay NAS and connect it via iSCSI to the VBR server. Even there should be more possible than 2MB/s.

Also your VBR specs are at the minimum, I’d give it at least 2-4GB more memory if possible.

The engine is definitely capable of more, so I guess this is some HW/config limit and I assume that’s about the virtual USB performance - I think that is not really the most optimized vSphere use case.

Today you can probably check the performance stats of the external drive. Is it 100% busy?

I assume you need this restore done, so I’d also wait for it as Chris said, but probably afterwards you can test some things and then consider this in case of other restores:

  • How fast can you copy files from the external drive to your VM?
  • Take the external drive and connect natively to a physical machine (e.g. your laptop/workstation) - check the read speed there.
  • If the latter is much faster you could install CE on a physical box for restores: import the backups from the external drive and try to restore from there and verify if the speed is better.

When you identify the external HDD as the culprit you can consider going for a low end 1-bay/2-bay NAS and connect it via iSCSI to the VBR server. Even there should be more possible than 2MB/s.

Also your VBR specs are at the minimum, I’d give it at least 2-4GB more memory if possible.

 

Thanks for pointing out the performance stats of the external drive. I have overlooked that information. And strangely when I checked it kinda looks like that there is no activity all.

disk_stats

 

This is the backup job I did, I have 2 VMs in 1 job and I’m only trying to restore a specific VM (highlighted in red)

backup job

 

Userlevel 7
Badge +13

Did I get it right: backup is located on a USB drive that is connected to a ESXi host that passes it into the backup VM? If so, you may migrated (vMotion) the backup VM to another host? Supported USB devices will “follow” the VM but you will see far worse performance than. To check this, check if backup VM is still on same host as the USB drive is.

Userlevel 7
Badge +3

What version of Veeam are you using? 2TB is a large VM and can certainly take a while. A couple of things to consider:

 

If you are using VBR v.11, there are some known performance issues and this could be the root of your problem. If I recall correctly, I had a coworker who had a VM restore that ended up failing due to the time it took. I may be mis-remembering that, but I believe there was an issue with the restore time. You might consider upgrading to v.11a. Those issues were fixed in 11a.

 

You might be running into a bandwidth issue. Are you using the backup server itself as the backup proxy? What kind of traffic do you have on the network? Do you have any throttling set in place (you can check that in the main settings menu (the hamburger in the top left of the VBR window)).

 

What version of vcenter are you using? My company ran into an issue with one of the updates in vcenter, not liking Veeam v.11. I don’t recall which update it was, but if you have updated in the past few months, that may also be your issue. Check to see if there is a vcenter update available and update it if possible.

Userlevel 7
Badge +3

This is connected on the ESXi host via USB (3.0 configuration) where the Veeam VM is located. And the file is being restored to another ESXi host on the same subnet. But that kind of slow? And this would take probably a month for the restoration to be completed.

What kind of drive do you have connected? Is it a run of the mill USB external HDD or part of a NAS or SAN? If VBR is running on that drive (if it is just a standard commercial external HDD), that could also be your issue, as was mentioned above.

Userlevel 7
Badge +9

@bp4JC has asked some of my questions. But here are some of my inputs.

You have to identify the real issue (bottleneck) in your environment and this can be attributed to a lot of reasons. Kindly check the job session log as this will help you lot.
- Have you tried restoring to a new location to see if this issue persists? Is the host fully patched (firmware) as these can cause unnecessary performance issues. 

Userlevel 7
Badge +13

I believe it is also necessary to understand if encryption of the backup has been performed and if high compression has been selected. As @bp4JC 2TB suggests is a large VM, but a disaster recovery on a small sas structure you're about ... 8 hours maximum?

This seems more a problem regarding structure (server) where you’re operating.

Check if you have a disk in prefail.

Hello Everyone, Thank you for you inputs.

 

The job aborted due to server termination.

 

So I had to restart the job again but with the following changes:

  1. Increased Veeam VM memory to 16GB from 4GB. Why 16GB? At first, I’ve configured it 8GB but after booting the Veeam VM without running the VBR my memory is already consuming 38-41% with the SQL Server (VEEAMSQL2016) process.
  2. Moved my restoration host to another ESXi host, from host #1 with 7 VM running to host #2 with only 2 VM running. Both hosts are manage by the same vCenter. While the Veeam VM is on a different ESXi host manage by a different vCenter.

 

Results as of this writing:

  1. Status: In progress (52%)
  2. Statistics
    1. Restore rate: 15MB/s (983.0 GB left)
    2. Time remaining: 19:xx:xx
  1. Log
    1. 99.5GB restored at 1 MB/s [nbd]
    2. Time elapsed: 20:44:xx
  1. Memory utilization 7.5/16.0GB (47%) with the SQL process on highest consumption.
  2. Network utlization with the average send 20Mbps.

Everything else is the same configuration as about, including the throttling. External HDD is WD drive.

 

Will provide update again for any results.

Userlevel 7
Badge +13

Hello Everyone, Thank you for you inputs.

 

The job aborted due to server termination.

 

So I had to restart the job again but with the following changes:

  1. Increased Veeam VM memory to 16GB from 4GB. Why 16GB? At first, I’ve configured it 8GB but after booting the Veeam VM without running the VBR my memory is already consuming 38-41% with the SQL Server (VEEAMSQL2016) process.
  2. Moved my restoration host to another ESXi host, from host #1 with 7 VM running to host #2 with only 2 VM running. Both hosts are manage by the same vCenter. While the Veeam VM is on a different ESXi host manage by a different vCenter.

 

Results as of this writing:

  1. Status: In progress (52%)
  2. Statistics
    1. Restore rate: 15MB/s (983.0 GB left)
    2. Time remaining: 19:xx:xx
  1. Log
    1. 99.5GB restored at 1 MB/s [nbd]
    2. Time elapsed: 20:44:xx
  1. Memory utilization 7.5/16.0GB (47%) with the SQL process on highest consumption.
  2. Network utlization with the average send 20Mbps.

Everything else is the same configuration as about, including the throttling. External HDD is WD drive.

 

Will provide update again for any results.

Based solely on my experience I increasingly think that the problem is hardware of the old server, the one you were trying to restore before. 

But is it true that Veeam says as minimum requirements: Memory: 4 GB RAM plus 500 MB RAM for each enabled job.

Let us now if process finish successfully :)

Userlevel 7
Badge +3

Those sound like fantastic changes and should help significantly. You want to throw all of the resources at your Veeam server that you can.

 

Something that occurs to me. What version of VBR (Veeam Backup and Replication) do you have installed? Version 11 is known to have some performance issues. If you run into an issue again, it might be worth uninstalling 11 and installing v.10a in its place, or even move to 11a. I don’t know how feasible this is, but it’s an idea if you are using v.11.

Hello, update as of today.

  1. Status: In progress (69%)
  2. Statistics
    1. Restore rate: 4MB/s (634.9.0 GB left)
    2. Time remaining: 18:xx:xx
  1. Log
    1. 452.2GB restored at 1 MB/s [nbd]
    2. Time elapsed: 93:52:xx
  1. Memory utilization 5.7/16.0GB (36%) with the SQL process on highest consumption of 2905.9MB.
  2. Network utlization still with the average send 20Mbps.

There’s a significant drop with the restore rate from 15MB to 4MB.

 

@bp4JC, unfortunately, its VBR 9.5 Update 4 (9.5.4.2753).

 

Userlevel 7
Badge +13

Hello, update as of today.

  1. Status: In progress (69%)
  2. Statistics
    1. Restore rate: 4MB/s (634.9.0 GB left)
    2. Time remaining: 18:xx:xx
  1. Log
    1. 452.2GB restored at 1 MB/s [nbd]
    2. Time elapsed: 93:52:xx
  1. Memory utilization 5.7/16.0GB (36%) with the SQL process on highest consumption of 2905.9MB.
  2. Network utlization still with the average send 20Mbps.

There’s a significant drop with the restore rate from 15MB to 4MB.

 

@bp4JC, unfortunately, its VBR 9.5 Update 4 (9.5.4.2753).

 

No, it’s not normal. I advise you to let the procedure finish, but after it has finished check the structure at the hardware level.

Check these treads.

https://forums.veeam.com/vmware-vsphere-f24/slow-restore-speeds-why-t70302.html

https://forums.veeam.com/vmware-vsphere-f24/sudden-drop-in-speed-t59678.html

Userlevel 7
Badge +20

Hello, update as of today.

  1. Status: In progress (69%)
  2. Statistics
    1. Restore rate: 4MB/s (634.9.0 GB left)
    2. Time remaining: 18:xx:xx
  1. Log
    1. 452.2GB restored at 1 MB/s [nbd]
    2. Time elapsed: 93:52:xx
  1. Memory utilization 5.7/16.0GB (36%) with the SQL process on highest consumption of 2905.9MB.
  2. Network utlization still with the average send 20Mbps.

There’s a significant drop with the restore rate from 15MB to 4MB.

 

@bp4JC, unfortunately, its VBR 9.5 Update 4 (9.5.4.2753).

 

That is too bad it slowed down again but you should think about upgrading Veeam once you get this done.  I am pretty sure restore would run better on v10 or 11a due to the enhancements.

Userlevel 7
Badge +8

What’s the current status of your recovery, @bhinx ?

I would bet the beer I’m just about to consume that the slow rate is mainly due to the virtualized USB stack. :sunglasses:

This is by no means a performant scenario and I always would try to avoid it.

Better option: connect your USB drive to your PC - ideally in the same network as the VBR server - and mount the folder on the USB drive containing the backup as an SMB repo to your VBR inside the VM.

This will deliver way better performance. Or even push a “real” repo service to your PC.

Next optimization step would be the type of proxy used. With the backup server inside a VM try to make it a hot-add proxy (=virtual appliance mode). This will also speed up the process as you won’t suffer from the ~40% throttle on the management interface of the ESX.

Hello everyone, thank for your inputs on my issue. Appreciate it.

The restoration was successful for about 118:26:xx.

I guess the way to move forward with this is to upgrade the VBR and find a better solution for the storage and do some tests again :D

Userlevel 7
Badge +13

Hello everyone, thank for your inputs on my issue. Appreciate it.

The restoration was successful for about 118:26:xx.

I guess the way to move forward with this is to upgrade the VBR and find a better solution for the storage and do some tests again :D

Very well, the most important thing is that in the end the restore was successful.

@bhinx check health status of hardware too, bye! :wink:

ps. please select one of the answer to close this case.

Userlevel 7
Badge +20

What’s the current status of your recovery, @bhinx ?

I would bet the beer I’m just about to consume that the slow rate is mainly due to the virtualized USB stack. :sunglasses:

This is by no means a performant scenario and I always would try to avoid it.

Better option: connect your USB drive to your PC - ideally in the same network as the VBR server - and mount the folder on the USB drive containing the backup as an SMB repo to your VBR inside the VM.

This will deliver way better performance. Or even push a “real” repo service to your PC.

Next optimization step would be the type of proxy used. With the backup server inside a VM try to make it a hot-add proxy (=virtual appliance mode). This will also speed up the process as you won’t suffer from the ~40% throttle on the management interface of the ESX.

I think this along with it being 9.5 version made things restore much slower than having a newer version of Veeam as well as a different way to restore from the USB.

Userlevel 7
Badge +17

Please be aware that the support for VBR 9.5 ends at January 1st 2022.

So, it is absolutely recommended to upgrade to V10 or V11...

Badge +4

i really think the setup of that USB-drive connected to the host is rather odd…

wasn’t it just faster to copy-paste the content of the usb-drive (i suspect a vbk-file) to the veeamserver/some local/some normal storage and from there launch the restore?

(while writing I noticed @Michael Melter  mentioned more or less the same).

I know a school that did this as they use those massive usb-disks as rotated drives. Direct restore from such a disk is simply a receipt for faillures.

Badge

One should only use the proxy that is located where the backup files are for the restore operation.

There is an option to select the proxy to use for restoring operation.

Once I selected only the proxy where the backup files located, I got the 10G transfer rate.

I don’t quite understand why the automatic option would select the other proxies, not located where the backup files are, but when manually selected the right proxy, the restore flies.  

Comment