This post is related to a real use-case at a customer.
Some time ago, this customer asked me to perform a backup audit on their infrastructure because they had doubts about the current setup
So of course I did, I do this often for existing and new customers…
If you want to know more about it, just go on reading this topic
Some info about the customer and their infrastructure :
- they are one of the biggest companies in Belgium in their sector
- they are having a large vmware cluster : 1 cluster with HA
- that vmware cluster is divided over 2 locations at the same site but with +/- 500 meters between and no buildings in between, so ideal in some DR scenario’s
- between the 2 locations they are having a 40Gbit fiber connection
- the current backup-setup is being done by another IT company in Belgium
- they are having 2 backup-servers, 1 on each location
- they are using on each location a physical SAN as shared storage with synchronous mirroring between the 2 SANs using hybrid storage with auto tiering of NVME, SSD and SAS HDD
- 2x 10Gbit was being used for storage connection and 2x 10Gbit was being used for active synchronous mirroring between the storage
So, what was my verdict about the current backup-setup?
Shortly : very bad !!!
More in detail :
- they were using 2 physical backup-servers with Veeam installed on both of them : no problem so far
- but, on both of them jobs were created for the same HA cluster - VMs can be migrated from host 1 of location 1 to host 6 of location 2 : so why ???
- they created a separate job per VM : what ???
- the primary backups jobs are put on the local repository of the backup-server : OK
- the secondary backup copy job was put on the other backup-server : OK, but
- they were using SMB : OMG, can it be worse ???
- firmware was not installed in 3 years : not real ???
- drivers were not even installed on the backup-servers : never seen ??? - a lot of unknown devices in the device manager : ???
- the proxy-server being used was the physical backup-server with 2 concurrent tasks, while the backup-server is having 16 cores each : forgotten to adjust ???
- they were using 10Gbit LACP on the backup-server to the core-switches : OK, but
- they were using a separate VLAN for the backups per location with as default gateway the firewall with 1Gbit while the vmware cluster was running in the management VLAN : Oh now !!!
- the transport mode being used was NBD : I’m not the biggest fan of that, in my opinion it should be used as the last resort : too easy and mostly for people not knowing much about Veeam and performance is not always good
- the performance was bad indead : average of 30MB/s for a backup with that kind of infrastructure !!!
- they were not using airgapped/immutable/offline backups
- not all VMs were being backed up : really ???
- ...
I asked the customer :
What do you think about the current performance of the backups?
The answer of the customer :
We don’t think it’s that good, that is the reason why we asked an expert, you, to look into this matter.
My answer :
My suggestion is to perform a totally new backup-setup with no extra infrastructure investments but with following criteria :
- only 3 days of consultancy of me
- a guaranteed much easier and logical setup
- a much more performant setup : at least with a factor of 5 à 10 times increase of the current performance
- using the current Veeam best practices and setup the servers from scrath and fully up-to-date
The customer agreed and so recently I implemented the new backup-setup.
A small resume :
- firmware up-to-date
- drivers up-to-date
- OS up-to-date
- only 1 backup-server is being used as a managed Veeam-server with only 3 primary backup-jobs
- using vmware folders : 1 job per folder
- using Direct SAN transport method : I used the 10Gbit interfaces using iSCSI to connect to the SANs because the customer is only using thick provisioned VMDKs
- using copy-jobs to the secondary backup-server using that as managed server with REFS 64K instead of SMB
- Veeam installed as it should be using the current best practices
- Implemented MFA
- using a copy-job to my immutable BaaS offering of my company
- ...
→ I tested the performance of the backups : instead of an average of 30MB/s before I was having now 500MB/s à 800MB/s : what a speed !!!!
So a performance increase of factor +/- 20 !!!
The customer asked me to perform a restore of a certain VM that had a size of several hundreds of GBs.
OK, I said and started that restore.
They had to restore that particular VM some time ago and it took more than 6 hours !!!
Then they decided that this is not what they expected and asked me to perform an audit.
The result of that restore now : only 20 minutes !!!
The customer said to me :
You’re a true wizard!
When we change our current infrastructure, you’re the only one that is setting up the new backup-infrastructure , you truly know what you are doing...
I see often setups that are not being setup ideally.
Why is that?
That is in my opinion the most dangerous thing about Veeam.
- It’s easy to setup, it’s very accessible to start with it and it will work!
- But, will it being setup as it should?
- Only when you know what you are doing and having much experience with Veeam...