Skip to main content

Here’s one to find out who has the largest backup in the forums.

 

Currently my largest server I backup is 45.1TB.  

The largest one I had backing up was 115TB.  It took a while but worked fine.  I’ve been working on splitting it up in to multiple file servers to allow more concurrent streams to our backup/tape jobs.

 

@ejfarrar Thanks for that very interesting writeup and response.   I agree, management vs performance vs requirements are always a challenge for sizing.  I’ve been messing with queue depth and performance a lot lately on some of our analytics VM’s and databases tuning things but there is also something nice about having a larger volume from a management perspective. 

As a SAN guy, if you run dedupe too, having large volumes on the back end is beneficial as well. 

There really is no right way to do it or right answer, there is however a wrong way when things become unmanageable or the performance is bad. 😄


These impressive numbers gave me a headache but it is very interesting.

I'm curious how you saved the file server r&d? Agent? File Share jobs?
On what type of repo and what retention? Object store my read is good?

How long on the activefull that initiates the backup chain?

That one uses the agent on one of the two nodes of the cluster.  My wallet cringes at the new licensing model for file share jobs when I think of my servers (even though it technically isn’t my money, keeping costs down enhances profits which enhances bonuses and salaries).  I have moved to purely scale-out repositories for our block systems but we are about to re-assess since our S3 object storage on-prem is about to double in size (its already in double-digit PB)…  We are exploring various scenarios where block and object storage are used for all types of workloads…

I have about 800TB of scale-out repository capacity plus about the same on our dedupe appliance where our backup copies go…

All except the >130TB systems are now running on reverse-incremental backups with once a month active full backups.  Retention for on-disk restore points is 7 days, backup copy retention is 5x weekly, 3x monthly, 4x quarterly.  The exception to that is basically all of the big *nix file servers.  They have a 4 day retention policy on disk (offset by 7 day retention + 1 weekly on storage snapshots) and 2x weekly on backup copy.  A couple large *nix file servers are under legal retention policy for over a decade, so those ones have some longer backup copy restore points kept.

For big systems that don’t have active full backups frequent enough to minimize the restore points to our policy, a synthetic full backup runs (usually once a week) so we aren’t locked into old full backups which don’t line up with the retention policies..


good day all, I have found with search engine this thread because a customer asks me about the backup of a sas viya (or similar ap: not a nas) server which is over 100 TB.
They already tried a regular image based snapshot but they report problems like errors in snapshot removal and so on.

IMO a proper storage snapshot integration could solve the issue. But probably not viable.

They think about protecting system disk with a regular image based backup and other disks with Agent, mabye in snapshot less mode, because they are concerned more about the chance to saturate some filesystem than consistency.

Can I ask your opinion about this “mixed mode” to process the VM? Thanks


good day all, I have found with search engine this thread because a customer asks me about the backup of a sas viya (or similar ap: not a nas) server which is over 100 TB.
They already tried a regular image based snapshot but they report problems like errors in snapshot removal and so on.

IMO a proper storage snapshot integration could solve the issue. But probably not viable.

They think about protecting system disk with a regular image based backup and other disks with Agent, mabye in snapshot less mode, because they are concerned more about the chance to saturate some filesystem than consistency.

Can I ask your opinion about this “mixed mode” to process the VM? Thanks

Usually VMs that are this size you definitely want SAN integration to make things faster and more consistent.  Regular snapshots can be a pain unless the underlying storage is fast but even then it still can be problematic.  Agent would work for this too moving the VSS to the system but needs enough working room for the snapshot there.  It is an it depends answer.  😂 


yeah, you are right. And the safest path would be applying best practice from the producer (it’s a SAS VIYA server): use proprietary tools to achieve an application consistent backup.

And, maybe, avoid creating VMs that big…

It’s sad, but eventually the backup admin or veeam consultant will be just asked to create a backup job that finish successfully to be compliant with policies, ISO and so on. And never try to imagine a restore testing… 🙄  


yeah, you are right. And the safest path would be applying best practice from the producer (it’s a SAS VIYA server): use proprietary tools to achieve an application consistent backup.

And, maybe, avoid creating VMs that big…

It’s sad, but eventually the backup admin or veeam consultant will be just asked to create a backup job that finish successfully to be compliant with policies, ISO and so on. And never try to imagine a restore testing… 🙄  

Agree with you there.  I have seen it and lived it so it is a fine line you need to walk between keepng the VM running 100%, SLA/RTO/RPO and compliance.  Best of luck with that one.


My largest was about 155TB.

Storage snapshot is pretty much a requirement, especially if the VM is busy. That being said, multiple disks is going to solve this problem and the more disks, the more parallel processing you will have as well.    

It was a while ago I had to set it up, but there were also a few advanced VMware settings I had to set up for that VM to do other things like VMotion as after 24 hours it will fail on previous versions. There is a timeout setting in there and something else that was modified to allow me to work with it more often. 

 

If you are dealing with monster VM’s, you may want to reach out to Broadcom to ask about any specific settings on the back end that will help you, or if you are having failures to look into the logs.