Bad Practice vs Good Practice in a real use case


Userlevel 7
Badge +11

This post is related to a real use-case at a customer.

Some time ago, this customer asked me to perform a backup audit on their infrastructure because they had doubts about the current setup 🤔

So of course I did, I do this often for existing and new customers…

If you want to know more about it, just go on reading this topic 😁

 

Some info about the customer and their infrastructure : 

  • they are one of the biggest companies in Belgium in their sector
  • they are having a large vmware cluster : 1 cluster with HA
  • that vmware cluster is divided over 2 locations at the same site but with +/- 500 meters between and no buildings in between, so ideal in some DR scenario’s
  • between the 2 locations they are having a 40Gbit fiber connection
  • the current backup-setup is being done by another IT company in Belgium
  • they are having 2 backup-servers, 1 on each location
  • they are using on each location a physical SAN as shared storage with synchronous mirroring between the 2 SANs using hybrid storage with auto tiering of NVME, SSD and SAS HDD
  • 2x 10Gbit was being used for storage connection and 2x 10Gbit was being used for active synchronous mirroring between the storage

 

So, what was my verdict about the current backup-setup?

 

Shortly : very bad !!!

More in detail : 

  • they were using 2 physical backup-servers with Veeam installed on both of them : no problem so far
  • but, on both of them jobs were created for the same HA cluster - VMs can be migrated from host 1 of location 1 to host 6 of location 2 : so why ???
  • they created a separate job per VM : what ???
  • the primary backups jobs are put on the local repository of the backup-server : OK
  • the secondary backup copy job was put on the other backup-server : OK, but
  • they were using SMB : OMG, can it be worse ???
  • firmware was not installed in 3 years : not real ???
  • drivers were not even installed on the backup-servers : never seen ??? - a lot of unknown devices in the device manager : ???
  • the proxy-server being used was the physical backup-server with 2 concurrent tasks, while the backup-server is having 16 cores each : forgotten to adjust ???
  • they were using 10Gbit LACP on the backup-server to the core-switches : OK, but
  • they were using a separate VLAN for the backups per location with as default gateway the firewall with 1Gbit while the vmware cluster was running in the management VLAN : Oh now !!!
  • the transport mode being used was NBD : I’m not the biggest fan of that, in my opinion it should be used as the last resort : too easy and mostly for people not knowing much about Veeam and performance is not always good
  • the performance was bad indead : average of 30MB/s for a backup with that kind of infrastructure !!!
  • they were not using airgapped/immutable/offline backups 
  • not all VMs were being backed up : really ???
  • ...

 

I asked the customer :

What do you think about the current performance of the backups?

The answer of the customer :

We don’t think it’s that good, that is the reason why we asked an expert, you, to look into this matter.

My answer :

My suggestion is to perform a totally new backup-setup with no extra infrastructure investments but with following criteria : 

  • only 3 days of consultancy of me
  • a guaranteed much easier and logical setup
  • a much more performant setup : at least with a factor of 5 à 10 times increase of the current performance
  • using the current Veeam best practices and setup the servers from scrath and fully up-to-date

 

The customer agreed and so recently I implemented the new backup-setup.

A small resume : 

  • firmware up-to-date
  • drivers up-to-date
  • OS up-to-date
  • only 1 backup-server is being used as a managed Veeam-server with only 3 primary backup-jobs
  • using vmware folders : 1 job per folder
  • using Direct SAN transport method : I used the 10Gbit interfaces using iSCSI to connect to the SANs because the customer is only using thick provisioned VMDKs
  • using copy-jobs to the secondary backup-server using that as managed server with REFS 64K instead of SMB
  • Veeam installed as it should be using the current best practices
  • Implemented MFA
  • using a copy-job to my immutable BaaS offering of my company
  • ...

 

→ I tested the performance of the backups : instead of an average of 30MB/s before I was having now 500MB/s à 800MB/s : what a speed !!!!

So a performance increase of factor +/- 20 !!!

 

The customer asked me to perform a restore of a certain VM that had a size of several hundreds of GBs.

OK, I said and started that restore.

They had to restore that particular VM some time ago and it took more than 6 hours !!!

Then they decided that this is not what they expected and asked me to perform an audit.

 

The result of that restore now : only 20 minutes !!!

 

The customer said to me :

You’re a true wizard!

When we change our current infrastructure, you’re the only one that is setting up the new backup-infrastructure 🤗, you truly know what you are doing...

 

I see often setups that are not being setup ideally.

Why is that?

 

That is in my opinion the most dangerous thing about Veeam.

  • It’s easy to setup, it’s very accessible to start with it and it will work!
  • But, will it being setup as it should? 
  • Only when you know what you are doing and having much experience with Veeam...

 

 

 

 

 


9 comments

Userlevel 7
Badge +4

As support engineer at Veeam I see many, many customers like this.

Congrats on make your customer life easier, Nico :)

Userlevel 7
Badge +20

Great story and ending Nico. Glad to see things on the right track after you checked things.

Userlevel 7
Badge +17

Great job revamping their environment @Nico Losschaert 🙌🏻

Userlevel 6
Badge +8

thanks for sharing Nico.

 

We often see this. What did you do with the previous and historical backup points? Had they enough space to start new chains?

Userlevel 7
Badge +8

What a change. Sounds alot like my environment when I started.  

That is a ton of issues. I try and re read the best practices often to keep it in my brain for each upgrade. 

Userlevel 7
Badge +11

thanks for sharing Nico.

 

We often see this. What did you do with the previous and historical backup points? Had they enough space to start new chains?

No probs @kristofpoppe. Luckily the customer had enough space to start new chains. With customer having not enough, we have to be creative 😉. Normally the way to do is deleting the oldest regular chain, afterwards again and again until everything of before is being erased. Not ideal, therefore being on its limits regarding available storage is not a good idea.

Userlevel 7
Badge +7

The “problem” is that Veeam is very easy to install and configure for backups with a simple setup.

Many customers/IT companies don’t understand the importance of a real knowledge of the product..

Userlevel 7
Badge +10

This post is related to a real use-case at a customer.

Some time ago, this customer asked me to perform a backup audit on their infrastructure because they had doubts about the current setup 🤔

So of course I did, I do this often for existing and new customers…

If you want to know more about it, just go on reading this topic 😁

 

….

More in detail : 

  • they were using 2 physical backup-servers with Veeam installed on both of them : no problem so far
  • but, on both of them jobs were created for the same HA cluster - VMs can be migrated from host 1 of location 1 to host 6 of location 2 : so why ???
  • they created a separate job per VM : what ???
  • the primary backups jobs are put on the local repository of the backup-server : OK
  • the secondary backup copy job was put on the other backup-server : OK, but
  • they were using SMB : OMG, can it be worse ???
  • firmware was not installed in 3 years : not real ???
  • drivers were not even installed on the backup-servers : never seen ??? - a lot of unknown devices in the device manager : ???
  • the proxy-server being used was the physical backup-server with 2 concurrent tasks, while the backup-server is having 16 cores each : forgotten to adjust ???
  • they were using 10Gbit LACP on the backup-server to the core-switches : OK, but
  • they were using a separate VLAN for the backups per location with as default gateway the firewall with 1Gbit while the vmware cluster was running in the management VLAN : Oh now !!!
  • the transport mode being used was NBD : I’m not the biggest fan of that, in my opinion it should be used as the last resort : too easy and mostly for people not knowing much about Veeam and performance is not always good
  • the performance was bad indead : average of 30MB/s for a backup with that kind of infrastructure !!!
  • they were not using airgapped/immutable/offline backups 
  • not all VMs were being backed up : really ???
  • ...

Reading all of this I would say... you like to win easy! 🤣

My cousin's friend did better in the beginning.

You had to turn the screw in the right direction.

 

Bravo @Nico Losschaert 

Userlevel 5
Badge +2

Main challenge is… Veeam “Simply works” (since the good old times...). You can run the installation and configuration by just clicking next next next finish. 

We struggle since many years with this challenge. We brought out the best practice analyzer to have a rough attempt into the right direction. 

Another thing is, that some customers grow with their environment or started with veeam with a small devision but then figured out that it works like a charm and moved ,more workloads from time to time into the veeam environment… but never touched or expanded the based installation from VBR.

Comment