Largest VM you backup?


Userlevel 7
Badge +8

Here’s one to find out who has the largest backup in the forums.

 

Currently my largest server I backup is 45.1TB.  

The largest one I had backing up was 115TB.  It took a while but worked fine.  I’ve been working on splitting it up in to multiple file servers to allow more concurrent streams to our backup/tape jobs.

 


52 comments

Userlevel 2

These impressive numbers gave me a headache but it is very interesting.

I'm curious how you saved the file server r&d? Agent? File Share jobs?
On what type of repo and what retention? Object store my read is good?

How long on the activefull that initiates the backup chain?

That one uses the agent on one of the two nodes of the cluster.  My wallet cringes at the new licensing model for file share jobs when I think of my servers (even though it technically isn’t my money, keeping costs down enhances profits which enhances bonuses and salaries).  I have moved to purely scale-out repositories for our block systems but we are about to re-assess since our S3 object storage on-prem is about to double in size (its already in double-digit PB)…  We are exploring various scenarios where block and object storage are used for all types of workloads…

I have about 800TB of scale-out repository capacity plus about the same on our dedupe appliance where our backup copies go…

All except the >130TB systems are now running on reverse-incremental backups with once a month active full backups.  Retention for on-disk restore points is 7 days, backup copy retention is 5x weekly, 3x monthly, 4x quarterly.  The exception to that is basically all of the big *nix file servers.  They have a 4 day retention policy on disk (offset by 7 day retention + 1 weekly on storage snapshots) and 2x weekly on backup copy.  A couple large *nix file servers are under legal retention policy for over a decade, so those ones have some longer backup copy restore points kept.

For big systems that don’t have active full backups frequent enough to minimize the restore points to our policy, a synthetic full backup runs (usually once a week) so we aren’t locked into old full backups which don’t line up with the retention policies..

Userlevel 7
Badge +8

@ejfarrar Thanks for that very interesting writeup and response.   I agree, management vs performance vs requirements are always a challenge for sizing.  I’ve been messing with queue depth and performance a lot lately on some of our analytics VM’s and databases tuning things but there is also something nice about having a larger volume from a management perspective. 

As a SAN guy, if you run dedupe too, having large volumes on the back end is beneficial as well. 

There really is no right way to do it or right answer, there is however a wrong way when things become unmanageable or the performance is bad. 😄

Userlevel 2

@ejfarrar That is impressive.

On that 522TB server, with the SINGLE volume. how big are your PV’s? 

Sure - make me go look!   - The 552TB (522 typo?), the VG for that LV has 40x 16TB (640TB) for that particular volume plus a second VG of 4x 64TB PVs for a 256TB LV…   So my numbers were off on that physical server….  I didn’t build that server personally so I try not to mess with it outside of backups...

What do you size your windows file servers Volumes\disks to as well? 

These Windows file servers have various disk sizes - each share is its own VMDK based on approved project requirements for each of those teams.  This is an AD-integrated set of servers with strict retention policies…  Most volumes are at most 2TB but a couple of the broad-audience volumes are- 8TB.  I do guest filesystem indexing on these (most of my systems do except for the beasts).

Those are some VERY large servers to back up. Are they accessed pretty heavy or mostly archive data?

If it is “big” it is heavily used in my world.  The test archive server for R&D gets hit heavy during automated testing overnight and on weekends with writes, and mostly reads during the day.  It’s kind of like a yo-yo; anywhere from 6 to 40TB gets written on any given night, and every week 40-200TB gets cleaned up.

Now that I am hitting the 100TB range for a few I’m finding smaller vmdk’s at least allows me some concurrency in Veeam for the backups.  My coworker would rather just size the VMDK’s to 64 TB and forget about it lol :) 

There are advantages and disadvantages to each method.  Active Full backups really take hits on performance (regardless of storage-integrated or network-based transfers) on the larger VMDKs.  The arrays have a hard time cleaning up and compacting them, but at the same time an array using dedupe/compression can make use of large volumes in those aspects.

As far as VEEAM is concerned, my experience is that smaller VMDKs are going to give you the best performance during backups and recovery. I’m not an array or SAN expert, but the arrays behind my protected systems as well as my repositories do just fine with whatever size volumes are presented but cleanup/reclaim/defrag/compacting is obviously quicker on smaller volumes.  I’ve been pushing VEEAM B&R to find breaking points since version 7…  One thing that has always made me reconsider using  single large VMDKs is that if you ever need to do a recovery from storage snapshots, you’re only going to get the VMDK of the first hard drive on a VM easily - anything more than that and you are in for a lengthy manual process… If you have compute/storage capacity, sandbox your scenarios and test it out - VEEAM can do what you need it to; you just have to iron out the details or your one-off’s (hopefully not under pressure)...

 

Userlevel 7
Badge +8

These impressive numbers gave me a headache but it is very interesting.

I'm curious how you saved the file server r&d? Agent? File Share jobs?
On what type of repo and what retention? Object store my read is good?

How long on the activefull that initiates the backup chain?

Userlevel 7
Badge +8

R&D Test Archive file server:  720 TB Allocated; 526 TB consumed but running 45x 16TB ZFS compressed volumes…   “DU” shows 985 TB used when all volumes are added up…  Cleanup scripts blow away 100TB of data per week based on various retention policies.   If I ever have to start this one from scratch again, I’ll be rather upset…  I’ve been baby-sitting this since it ran on an HP-UX ServiceGuard cluster and was 8x 2TB volumes, through migrations to RHEL 6.x VM, Solaris 10 Physical, a site relocation (without downtime - thanks VEEAM Replication!!!!), and finally migrating to RHEL 7 Veritas cluster (Physical)…  It just keeps growing and won’t die!

Runner-Up:  Level 4 support/developer NFS server: 625TB Allocated, 552TB used - SINGLE logical volume spanning dozens of PVs!

Current beasts of my environment (this is just one of five major sites for our team):

  • 61 TB (80 TB Allocated) CentOS 6.10 NFS Server
  • 5x 9-14 TB MS SQL Server clusters (2x servers each with independent SAN storage - these are physical workloads)
  • 24 TB (40 TB Allocated) CentOS 6.10 NFS Server
  • 26 TB (32 TB Allocated) CentOS 6.10 NFS Server
  • 74 TB (120 TB Allocted) Alma Linux 9 NFS Server (just rebuilt the OS disk and moved data disks from former CentOS 6.10)
  • Single job with over two dozen build servers each with 6-20 TB of workspaces (81 TB used of 200 TB Allocated)
  • 30+ MySQL and related Linux database servers totaling 31 TB
  • Single job with 5 Windows file servers; three of those are 20-24TB (48TB Allocated each
  • And the “beasts” are only 60% of this site’s protected capacity…   Not to mention double-digits PB of object storage!

I have found that with these large file servers, I have to start the jobs out with just one or two of their mount points, then un-”exclude” another mount point for each backup until I get the whole system backing up and get CBT nice and happy.

@ejfarrar That is impressive.

On that 522TB server, with the SINGLE volume. how big are your PV’s? 

What do you size your windows file servers Volumes\disks to as well? 

 

Those are some VERY large servers to back up. Are they accessed pretty heavy or mostly archive data?

 

Now that I am hitting the 100TB range for a few I’m finding smaller vmdk’s at least allows me some concurrency in Veeam for the backups.  My coworker would rather just size the VMDK’s to 64 TB and forget about it lol :) 

Userlevel 2

Ok, let’s turn this around ….. smallest VM?

64 Bytes?  x17,000 copies…  1.3 GB Thin Provisioned (1 GB of that is the ISO used by all 17,000 VMs):

 

Stress-testing some network services (DHCP, ARP Tables, etc) - using PowerCLI to clone a single VM 1000 times per run of the script (this is where I found that I could actually hit the maximum number of VMs per ESXi host AND per Cluster (not to mention VM Folders, virtual distributed switches, etc)).  Started this off with “TinyCore Linux” Live CD VM with just a couple tweaks to the boot ISO to make the VM generate a hostname on bootup based on its MAC address and the date/time it booted up…  This 2MB Thin-Provisioned VM with 1vCPU, 128MB RAM, 2MB Video RAM was converted to a template (leave the datastore ISO disk as part of the VM) and it was used to deploy these VMs.  Top couple rows of report:

Name

Status

Start time

End time

Size

Read

Transferred

Duration

Details

cxo-nimnetadm-vbr-A618

Success

12:14:02 AM

12:15:29 AM

1.3 GB

0 B

64 B

0:01:27

 

cxo-nimnetadm-vbr-A617

Success

12:14:02 AM

12:15:33 AM

1.3 GB

0 B

64 B

0:01:31

 

cxo-nimnetadm-vbr-A616

Success

12:14:02 AM

12:15:18 AM

1.3 GB

0 B

64 B

0:01:16

 

cxo-nimnetadm-vbr-A615

Success

12:14:02 AM

12:15:35 AM

1.3 GB

0 B

64 B

0:01:33

 

 

Userlevel 2

R&D Test Archive file server:  720 TB Allocated; 526 TB consumed but running 45x 16TB ZFS compressed volumes…   “DU” shows 985 TB used when all volumes are added up…  Cleanup scripts blow away 100TB of data per week based on various retention policies.   If I ever have to start this one from scratch again, I’ll be rather upset…  I’ve been baby-sitting this since it ran on an HP-UX ServiceGuard cluster and was 8x 2TB volumes, through migrations to RHEL 6.x VM, Solaris 10 Physical, a site relocation (without downtime - thanks VEEAM Replication!!!!), and finally migrating to RHEL 7 Veritas cluster (Physical)…  It just keeps growing and won’t die!

Runner-Up:  Level 4 support/developer NFS server: 625TB Allocated, 552TB used - SINGLE logical volume spanning dozens of PVs!

Current beasts of my environment (this is just one of five major sites for our team):

  • 61 TB (80 TB Allocated) CentOS 6.10 NFS Server
  • 5x 9-14 TB MS SQL Server clusters (2x servers each with independent SAN storage - these are physical workloads)
  • 24 TB (40 TB Allocated) CentOS 6.10 NFS Server
  • 26 TB (32 TB Allocated) CentOS 6.10 NFS Server
  • 74 TB (120 TB Allocted) Alma Linux 9 NFS Server (just rebuilt the OS disk and moved data disks from former CentOS 6.10)
  • Single job with over two dozen build servers each with 6-20 TB of workspaces (81 TB used of 200 TB Allocated)
  • 30+ MySQL and related Linux database servers totaling 31 TB
  • Single job with 5 Windows file servers; three of those are 20-24TB (48TB Allocated each
  • And the “beasts” are only 60% of this site’s protected capacity…   Not to mention double-digits PB of object storage!

I have found that with these large file servers, I have to start the jobs out with just one or two of their mount points, then un-”exclude” another mount point for each backup until I get the whole system backing up and get CBT nice and happy.

Userlevel 7
Badge +6

Yep...always going to be Linux.  I think my Pihole server is pretty darned small….but still a bit oversized I’m pretty sure at 16GB.  But I have seen VM’s down in the 2GB or less range.

Userlevel 7
Badge +8

I created a VM with No OS the other day so with data reduction essentially zero! I just wanted to test mounting a VMFS datastore to see if the VMDK file was still there. 😂

 

As a windows shop I’ve been giving a bit more to the system disks lately so I don’t really have “tiny” VM’s.  I like to increase logging a fair bit for audits so the extra space is nice to not have to shuffle stuff around. 

Userlevel 7
Badge +10

Ok, let’s turn this around ….. smallest VM?

Well in Rickatron Labbin, I do many powered off "empty disk" VMs but I have backed up the SureBackup network appliance that has no disks.

Userlevel 7
Badge +17

Ok, let’s turn this around ….. smallest VM?

Some tiny Linux VM with something between 3 and 6 GB… Was some very small Linux variant, don’t remember which one though….

Userlevel 7
Badge +9

Ok, let’s turn this around ….. smallest VM?

Userlevel 7
Badge +10

Well, Just got told I need to add another 50TB to this server, and I have another 300TB incoming,   It’s quite critical too.

 

I’m going to have to create a few servers most likely to spread the load, but maybe I’ll do a test in Veeam after to see how long it takes.

 

*Sigh* I’m going to need a few more SANs just for backup data, and a few in production. lol.   

Whoa, Scott! Pushing it!

Userlevel 7
Badge +8

Well, Just got told I need to add another 50TB to this server, and I have another 300TB incoming,   It’s quite critical too.

 

I’m going to have to create a few servers most likely to spread the load, but maybe I’ll do a test in Veeam after to see how long it takes.

 

*Sigh* I’m going to need a few more SANs just for backup data, and a few in production. lol.   

Userlevel 7
Badge +8

Sometimes on support I got customers with VMs around 30-40TB.

It's always a pain to troubleshooting problems with VMs like that, I hate. :(

Haha, “Will you stay on the phone with me while I restore this?” 🤣😋

 

 

Userlevel 7
Badge +4

Sometimes on support I got customers with VMs around 30-40TB.

It's always a pain to troubleshooting problems with VMs like that, I hate. :(

Userlevel 7
Badge +17

My biggest vm being backed up was a “Giant” 2TB SQL Server, and a 1.5TB Oracle Server.

they were not so big, but very critical, both running in Mechanic Discs, so the backups took so long to perform, and some times, the machines jammed, so they were backed up at night, out of working ours, and every time we did a change, fingers crossed for not messing the Full backup. 😂.

 

I was made aware of an 18TB Oracle backup (with the RMAN plugin)

OK, my Oracle databases are not that big. The biggest is around 5 -6 TB. But it is growing 😎

I think my biggest DB server is 14TB. I’m not doing app aware though as the SQL admins get scared when I say I should be taking over their backups..

 

Even after a demo...

I have convinced them, took me five years… 😂😂😂

They have seen that it is much easier for them to do the db backups via the plugins or the application aware backup.

I am looking forward to the MS SQL plugin in V12. 😎

Userlevel 7
Badge +8

My biggest vm being backed up was a “Giant” 2TB SQL Server, and a 1.5TB Oracle Server.

they were not so big, but very critical, both running in Mechanic Discs, so the backups took so long to perform, and some times, the machines jammed, so they were backed up at night, out of working ours, and every time we did a change, fingers crossed for not messing the Full backup. 😂.

 

I was made aware of an 18TB Oracle backup (with the RMAN plugin)

OK, my Oracle databases are not that big. The biggest is around 5 -6 TB. But it is growing 😎

I think my biggest DB server is 14TB. I’m not doing app aware though as the SQL admins get scared when I say I should be taking over their backups..

 

Even after a demo...

Userlevel 7
Badge +10

My largest is about a 120TB file server….which is getting ready to grow as the client is starting to put a lot of 4k video on it.  It’s been rough getting it through error checking backup defrags.  Unfortunately, we don’t have enough space on the repo to setup incremental full’s due to size.  It’s a process…..and we’re still trying to find a better way for it.

The 120 TB File server is that a VM, NAS or Lin/Win System?

Userlevel 7
Badge +17

My biggest vm being backed up was a “Giant” 2TB SQL Server, and a 1.5TB Oracle Server.

they were not so big, but very critical, both running in Mechanic Discs, so the backups took so long to perform, and some times, the machines jammed, so they were backed up at night, out of working ours, and every time we did a change, fingers crossed for not messing the Full backup. 😂.

 

I was made aware of an 18TB Oracle backup (with the RMAN plugin)

OK, my Oracle databases are not that big. The biggest is around 5 -6 TB. But it is growing 😎

Userlevel 7
Badge +10

My biggest vm being backed up was a “Giant” 2TB SQL Server, and a 1.5TB Oracle Server.

they were not so big, but very critical, both running in Mechanic Discs, so the backups took so long to perform, and some times, the machines jammed, so they were backed up at night, out of working ours, and every time we did a change, fingers crossed for not messing the Full backup. 😂.

 

I was made aware of an 18TB Oracle backup (with the RMAN plugin)

Userlevel 7
Badge +8

I’ll add, I’m not anti VUL or NAS backup 😀

I’ll probably get some VUL licenses for NAS backup going forward, I’m just going to be picky about where I use them.  NAS backup is awesome and I plan to use it, but more for backing up a NAS or share rather than a Windows Server VM.

 

Each backup should be treated it’s own way when it comes to requirements and pricing. If you have a NAS, it’s the perfect product.

Userlevel 7
Badge +8

What kind of workload is it for you, that “needs” those monster VMs? 

For our customers with VMs of ~30TB max. it’s mostly fileservers that have gone nuts during decades of lacking governance… 😉

There we usually have the discussion about NAS backup being an alternative with much better parallelity throughout the whole job run. Especially when V12 brings us NAS2tape.

Monster VMs tend to be the onces rolling in via a single thread at the end of a VM backup job.

Challenge with NAS backup for Windows/Linux servers is, that it gets waaaaaay more expensive. 1VM=1VUL vs. 30TB=60VUL…

I looked into NAS backup, Can’t afford it.

 

lets say i have 17 file servers at 50 TB each.

I have sockets currently 2 CPU hosts,  I could handle all those servers on 1-2 hosts, so 4 sockets max

Even at a 6-1 ratio for conversion that is 24 VUl.

 

Now, if I switched to VUL license, full VM backups =17 VUL, but I can handle another 50 VM’s probably on these servers so I’m still ahead with sockets, but pay upfront so VUL’s could potentially work here at slight increase.

 

17*50=850TB.  I’m going to let you guess how much that comes out to in NAS backup / VUL licenses.

At 500GB a VUL, that is like 1700 instead of 17.          

 

Our file servers have crazy growth, mostly from video and legal requirements of how much data we have to keep. Some of it is 60+ years, some I have been told “forever”

 

The DB, Application, Web, and other servers are all reasonable in my environment.

 

With our DR and backup requirements too, it means keeping multiple copies of these on multiple SANS and Tape.   My vendors all love me and probably owe me a few more lunches lol.

 

 

 

 

 

 

Userlevel 7
Badge +8

What kind of workload is it for you, that “needs” those monster VMs? 

For our customers with VMs of ~30TB max. it’s mostly fileservers that have gone nuts during decades of lacking governance… 😉

There we usually have the discussion about NAS backup being an alternative with much better parallelity throughout the whole job run. Especially when V12 brings us NAS2tape.

Monster VMs tend to be the onces rolling in via a single thread at the end of a VM backup job.

Challenge with NAS backup for Windows/Linux servers is, that it gets waaaaaay more expensive. 1VM=1VUL vs. 30TB=60VUL…

Userlevel 7
Badge +8

I was made aware of a 98 TB VM backed up by a customer in South Africa. And some Windows Servers in the ½ PB range as well.

½ PB is pretty good.  I know that VMware, Windows etc all support these monsters, but the manageability, portability, and time for backups is crazy.      I guess it’s like the previous generation of techs never allowing volumes over 2TB.  lol. That doesn’t scale today, but either I’m getting older and out of touch, or that is a ton of data in one spot. 🤣

Comment