Question

Backup strategies for a large VM

  • 11 February 2021
  • 8 comments
  • 111 views

Userlevel 6
Badge +2

Hello everyone,

I’m wondering what is your strategies/tricks about backuping a large vm (many TB+)? Duration? Active Full ? Synthetic one day a week?

 

I look forward to reading you :sunglasses:

 


8 comments

Userlevel 1
Badge

I don’t think it’s really about the size of the data, it’s about what your company and users expect regarding protection, restore points, time to restore, etc. That will drive the decisions you make regarding how and when you backup the data.

To try and answer your question though, for Production VM's I run daily incrementals using Change Block Tracking and a weekly (Saturday) active full. Servers are 10-30TB.

Userlevel 7
Badge +3

Hello @BertrandFR ,

I have discussed wth my vSphere and Windows / Linux colleagues about large virual disks quite some time.

We agreed finally to split large virtual disks (the biggests is about 14 TB) into several VMDKs with asize not greater than 2 TB. These VMDKs will be combined to a big disk on Windows or Linux side.

So we can backup and restore with several sessions and are much faster than with one big VMDK and one backup/restore session.

 

As backup strategy we are doing daily incremental backups with weekly synthetic full. As we are using ReFS and XFS repositories the synthetic fulls are created rather fast and without much space requirements….

Userlevel 6
Badge +2

It all boils down to your agreed BCP/DR policy and what RPO/RTO you have defined. I agree with the recommendations given by Jochen about splitting the VMDKs not more than 2TB as a backup job is per VMDK than per VM so that limits the scope of size of the data to 2TB max.

Userlevel 5
Badge +2

Hi BertrandFR

 for VM monster machines like file servers or huge databases I don't use the classic vsphere backup job for VM.
I use Veeam Agent Windows or Linux, to avoid taking snapshots on servers where the change rate is high.Iit is a risk snapshot a file server monster machine risk of saturating the datasore. In addition to avoiding keeping the snapshot open for too long to wait for the backup to finish.
With the veeam Agent I am sure to avoid this kind of problem.

I recommend a synthetic full with repository with fast clone enabled Refs or XFS.

If you don't have much space on the backup repository and you need to have many restore points, I recommend reverse incremental

cheers

Userlevel 6
Badge +2

Thank you all for your answers, it will guide me for the right solution :)

Userlevel 6
Badge +1

A good way could be synthetic once a week and full one time per month.

Userlevel 3
Badge

I backup a hand full of VM File servers from 5T to 14T with 2 Virtual Drives.  Take advantage of using a VMPROXY server for each host the VM lives in.  My File Servers are on its own host and each host has its own VMPROXY Server.   These Hosts have 10Gb nics.  Backups are sent to its own NAS Repositories with 10Gb Nics.  Incremental Daily and Weekend Active Full.  With Each File Server on its own Host with its own VMProxy being backed up to its Own NAS repository streamlines the process and minimises delays for backups to complete.

Userlevel 7
Badge +3

Awesome question.

 

I’d start by categorising the workload. Is it:

  • Highly transactional
  • High amount of data change
  • Low amount of data change / low amount of transactions

Once we know this then we need to know the connection options available. Can we use storage snapshots? What access modes can we use? Direct SAN Access? Hot-Add? What speed will our access mode be?

Finally I’d look at the speed of the repository and the connection path between proxy and repository, is the primary backup on-site or offsite? What connectivity speed is between them?

 

To provide some examples of what I’d be recommending:

 

If it was highly transactional I’d be looking to leverage storage snapshots for backups where possible as this will be the best prevention for longer stuns.

If there’s high data change I’d first ask the question of is it necessary data change? I’ve seen too many large CBT deltas due to Windows Deduplication enabled on a file server as an example. Or someone writing a SQL backup file every hour of a low changing database.

 

When it’s a large VM and the storage snapshot option is available I’d tend to use it for the initial backup if a maintenance window isn’t available to accept subpar performance. I prefer not to have to keep amending jobs.

 

Large VMs with little data change tend to be the ones on slower storage unless they’re transactional, so snapshots won’t grow as large in size, so it becomes more a conversation of ensuring the repository can commit data at a reasonable pace to keep snapshot time down.

 

If I hit a fringe case where most of the above isn’t applicable then I get out the bag of tricks. To keep backup job times lower if there’s many disks I can run jobs with just one disk added each subsequent run to ensure the host can clear down the logs before they’ve grown too large. Another trick can be setting a database log file drive to independent to prevent it being snapshotted and cutting down on redos.

Comment