Largest VM you backup?


Userlevel 7
Badge +8

Here’s one to find out who has the largest backup in the forums.

 

Currently my largest server I backup is 45.1TB.  

The largest one I had backing up was 115TB.  It took a while but worked fine.  I’ve been working on splitting it up in to multiple file servers to allow more concurrent streams to our backup/tape jobs.

 


52 comments

Userlevel 7
Badge +8

My biggest vm being backed up was a “Giant” 2TB SQL Server, and a 1.5TB Oracle Server.

they were not so big, but very critical, both running in Mechanic Discs, so the backups took so long to perform, and some times, the machines jammed, so they were backed up at night, out of working ours, and every time we did a change, fingers crossed for not messing the Full backup. 😂.

 

I’m a huge fan of redundancy. I used to have some fiber / SAN infrastructure I inherited from the guy before me that would cause outages every upgrade, change, reboot etc.  Backups would cause systems to halt too.

 

I basically started fresh and rebuilt it all from the ground up. It works fine now but I still get nervous. lol

Userlevel 7
Badge +10

I was made aware of a 98 TB VM backed up by a customer in South Africa. And some Windows Servers in the ½ PB range as well.

Userlevel 7
Badge +8

I was made aware of a 98 TB VM backed up by a customer in South Africa. And some Windows Servers in the ½ PB range as well.

½ PB is pretty good.  I know that VMware, Windows etc all support these monsters, but the manageability, portability, and time for backups is crazy.      I guess it’s like the previous generation of techs never allowing volumes over 2TB.  lol. That doesn’t scale today, but either I’m getting older and out of touch, or that is a ton of data in one spot. 🤣

Userlevel 7
Badge +8

What kind of workload is it for you, that “needs” those monster VMs? 

For our customers with VMs of ~30TB max. it’s mostly fileservers that have gone nuts during decades of lacking governance… 😉

There we usually have the discussion about NAS backup being an alternative with much better parallelity throughout the whole job run. Especially when V12 brings us NAS2tape.

Monster VMs tend to be the onces rolling in via a single thread at the end of a VM backup job.

Challenge with NAS backup for Windows/Linux servers is, that it gets waaaaaay more expensive. 1VM=1VUL vs. 30TB=60VUL…

Userlevel 7
Badge +8

What kind of workload is it for you, that “needs” those monster VMs? 

For our customers with VMs of ~30TB max. it’s mostly fileservers that have gone nuts during decades of lacking governance… 😉

There we usually have the discussion about NAS backup being an alternative with much better parallelity throughout the whole job run. Especially when V12 brings us NAS2tape.

Monster VMs tend to be the onces rolling in via a single thread at the end of a VM backup job.

Challenge with NAS backup for Windows/Linux servers is, that it gets waaaaaay more expensive. 1VM=1VUL vs. 30TB=60VUL…

I looked into NAS backup, Can’t afford it.

 

lets say i have 17 file servers at 50 TB each.

I have sockets currently 2 CPU hosts,  I could handle all those servers on 1-2 hosts, so 4 sockets max

Even at a 6-1 ratio for conversion that is 24 VUl.

 

Now, if I switched to VUL license, full VM backups =17 VUL, but I can handle another 50 VM’s probably on these servers so I’m still ahead with sockets, but pay upfront so VUL’s could potentially work here at slight increase.

 

17*50=850TB.  I’m going to let you guess how much that comes out to in NAS backup / VUL licenses.

At 500GB a VUL, that is like 1700 instead of 17.          

 

Our file servers have crazy growth, mostly from video and legal requirements of how much data we have to keep. Some of it is 60+ years, some I have been told “forever”

 

The DB, Application, Web, and other servers are all reasonable in my environment.

 

With our DR and backup requirements too, it means keeping multiple copies of these on multiple SANS and Tape.   My vendors all love me and probably owe me a few more lunches lol.

 

 

 

 

 

 

Userlevel 7
Badge +8

I’ll add, I’m not anti VUL or NAS backup 😀

I’ll probably get some VUL licenses for NAS backup going forward, I’m just going to be picky about where I use them.  NAS backup is awesome and I plan to use it, but more for backing up a NAS or share rather than a Windows Server VM.

 

Each backup should be treated it’s own way when it comes to requirements and pricing. If you have a NAS, it’s the perfect product.

Userlevel 7
Badge +10

My biggest vm being backed up was a “Giant” 2TB SQL Server, and a 1.5TB Oracle Server.

they were not so big, but very critical, both running in Mechanic Discs, so the backups took so long to perform, and some times, the machines jammed, so they were backed up at night, out of working ours, and every time we did a change, fingers crossed for not messing the Full backup. 😂.

 

I was made aware of an 18TB Oracle backup (with the RMAN plugin)

Userlevel 7
Badge +17

My biggest vm being backed up was a “Giant” 2TB SQL Server, and a 1.5TB Oracle Server.

they were not so big, but very critical, both running in Mechanic Discs, so the backups took so long to perform, and some times, the machines jammed, so they were backed up at night, out of working ours, and every time we did a change, fingers crossed for not messing the Full backup. 😂.

 

I was made aware of an 18TB Oracle backup (with the RMAN plugin)

OK, my Oracle databases are not that big. The biggest is around 5 -6 TB. But it is growing 😎

Userlevel 7
Badge +10

My largest is about a 120TB file server….which is getting ready to grow as the client is starting to put a lot of 4k video on it.  It’s been rough getting it through error checking backup defrags.  Unfortunately, we don’t have enough space on the repo to setup incremental full’s due to size.  It’s a process…..and we’re still trying to find a better way for it.

The 120 TB File server is that a VM, NAS or Lin/Win System?

Userlevel 7
Badge +8

My biggest vm being backed up was a “Giant” 2TB SQL Server, and a 1.5TB Oracle Server.

they were not so big, but very critical, both running in Mechanic Discs, so the backups took so long to perform, and some times, the machines jammed, so they were backed up at night, out of working ours, and every time we did a change, fingers crossed for not messing the Full backup. 😂.

 

I was made aware of an 18TB Oracle backup (with the RMAN plugin)

OK, my Oracle databases are not that big. The biggest is around 5 -6 TB. But it is growing 😎

I think my biggest DB server is 14TB. I’m not doing app aware though as the SQL admins get scared when I say I should be taking over their backups..

 

Even after a demo...

Userlevel 7
Badge +17

My biggest vm being backed up was a “Giant” 2TB SQL Server, and a 1.5TB Oracle Server.

they were not so big, but very critical, both running in Mechanic Discs, so the backups took so long to perform, and some times, the machines jammed, so they were backed up at night, out of working ours, and every time we did a change, fingers crossed for not messing the Full backup. 😂.

 

I was made aware of an 18TB Oracle backup (with the RMAN plugin)

OK, my Oracle databases are not that big. The biggest is around 5 -6 TB. But it is growing 😎

I think my biggest DB server is 14TB. I’m not doing app aware though as the SQL admins get scared when I say I should be taking over their backups..

 

Even after a demo...

I have convinced them, took me five years… 😂😂😂

They have seen that it is much easier for them to do the db backups via the plugins or the application aware backup.

I am looking forward to the MS SQL plugin in V12. 😎

Userlevel 7
Badge +4

Sometimes on support I got customers with VMs around 30-40TB.

It's always a pain to troubleshooting problems with VMs like that, I hate. :(

Userlevel 7
Badge +8

Sometimes on support I got customers with VMs around 30-40TB.

It's always a pain to troubleshooting problems with VMs like that, I hate. :(

Haha, “Will you stay on the phone with me while I restore this?” 🤣😋

 

 

Userlevel 7
Badge +8

Well, Just got told I need to add another 50TB to this server, and I have another 300TB incoming,   It’s quite critical too.

 

I’m going to have to create a few servers most likely to spread the load, but maybe I’ll do a test in Veeam after to see how long it takes.

 

*Sigh* I’m going to need a few more SANs just for backup data, and a few in production. lol.   

Userlevel 7
Badge +10

Well, Just got told I need to add another 50TB to this server, and I have another 300TB incoming,   It’s quite critical too.

 

I’m going to have to create a few servers most likely to spread the load, but maybe I’ll do a test in Veeam after to see how long it takes.

 

*Sigh* I’m going to need a few more SANs just for backup data, and a few in production. lol.   

Whoa, Scott! Pushing it!

Userlevel 7
Badge +9

Ok, let’s turn this around ….. smallest VM?

Userlevel 7
Badge +17

Ok, let’s turn this around ….. smallest VM?

Some tiny Linux VM with something between 3 and 6 GB… Was some very small Linux variant, don’t remember which one though….

Userlevel 7
Badge +10

Ok, let’s turn this around ….. smallest VM?

Well in Rickatron Labbin, I do many powered off "empty disk" VMs but I have backed up the SureBackup network appliance that has no disks.

Userlevel 7
Badge +8

I created a VM with No OS the other day so with data reduction essentially zero! I just wanted to test mounting a VMFS datastore to see if the VMDK file was still there. 😂

 

As a windows shop I’ve been giving a bit more to the system disks lately so I don’t really have “tiny” VM’s.  I like to increase logging a fair bit for audits so the extra space is nice to not have to shuffle stuff around. 

Userlevel 7
Badge +6

Yep...always going to be Linux.  I think my Pihole server is pretty darned small….but still a bit oversized I’m pretty sure at 16GB.  But I have seen VM’s down in the 2GB or less range.

Userlevel 2

R&D Test Archive file server:  720 TB Allocated; 526 TB consumed but running 45x 16TB ZFS compressed volumes…   “DU” shows 985 TB used when all volumes are added up…  Cleanup scripts blow away 100TB of data per week based on various retention policies.   If I ever have to start this one from scratch again, I’ll be rather upset…  I’ve been baby-sitting this since it ran on an HP-UX ServiceGuard cluster and was 8x 2TB volumes, through migrations to RHEL 6.x VM, Solaris 10 Physical, a site relocation (without downtime - thanks VEEAM Replication!!!!), and finally migrating to RHEL 7 Veritas cluster (Physical)…  It just keeps growing and won’t die!

Runner-Up:  Level 4 support/developer NFS server: 625TB Allocated, 552TB used - SINGLE logical volume spanning dozens of PVs!

Current beasts of my environment (this is just one of five major sites for our team):

  • 61 TB (80 TB Allocated) CentOS 6.10 NFS Server
  • 5x 9-14 TB MS SQL Server clusters (2x servers each with independent SAN storage - these are physical workloads)
  • 24 TB (40 TB Allocated) CentOS 6.10 NFS Server
  • 26 TB (32 TB Allocated) CentOS 6.10 NFS Server
  • 74 TB (120 TB Allocted) Alma Linux 9 NFS Server (just rebuilt the OS disk and moved data disks from former CentOS 6.10)
  • Single job with over two dozen build servers each with 6-20 TB of workspaces (81 TB used of 200 TB Allocated)
  • 30+ MySQL and related Linux database servers totaling 31 TB
  • Single job with 5 Windows file servers; three of those are 20-24TB (48TB Allocated each
  • And the “beasts” are only 60% of this site’s protected capacity…   Not to mention double-digits PB of object storage!

I have found that with these large file servers, I have to start the jobs out with just one or two of their mount points, then un-”exclude” another mount point for each backup until I get the whole system backing up and get CBT nice and happy.

Userlevel 2

Ok, let’s turn this around ….. smallest VM?

64 Bytes?  x17,000 copies…  1.3 GB Thin Provisioned (1 GB of that is the ISO used by all 17,000 VMs):

 

Stress-testing some network services (DHCP, ARP Tables, etc) - using PowerCLI to clone a single VM 1000 times per run of the script (this is where I found that I could actually hit the maximum number of VMs per ESXi host AND per Cluster (not to mention VM Folders, virtual distributed switches, etc)).  Started this off with “TinyCore Linux” Live CD VM with just a couple tweaks to the boot ISO to make the VM generate a hostname on bootup based on its MAC address and the date/time it booted up…  This 2MB Thin-Provisioned VM with 1vCPU, 128MB RAM, 2MB Video RAM was converted to a template (leave the datastore ISO disk as part of the VM) and it was used to deploy these VMs.  Top couple rows of report:

Name

Status

Start time

End time

Size

Read

Transferred

Duration

Details

cxo-nimnetadm-vbr-A618

Success

12:14:02 AM

12:15:29 AM

1.3 GB

0 B

64 B

0:01:27

 

cxo-nimnetadm-vbr-A617

Success

12:14:02 AM

12:15:33 AM

1.3 GB

0 B

64 B

0:01:31

 

cxo-nimnetadm-vbr-A616

Success

12:14:02 AM

12:15:18 AM

1.3 GB

0 B

64 B

0:01:16

 

cxo-nimnetadm-vbr-A615

Success

12:14:02 AM

12:15:35 AM

1.3 GB

0 B

64 B

0:01:33

 

 

Userlevel 7
Badge +8

R&D Test Archive file server:  720 TB Allocated; 526 TB consumed but running 45x 16TB ZFS compressed volumes…   “DU” shows 985 TB used when all volumes are added up…  Cleanup scripts blow away 100TB of data per week based on various retention policies.   If I ever have to start this one from scratch again, I’ll be rather upset…  I’ve been baby-sitting this since it ran on an HP-UX ServiceGuard cluster and was 8x 2TB volumes, through migrations to RHEL 6.x VM, Solaris 10 Physical, a site relocation (without downtime - thanks VEEAM Replication!!!!), and finally migrating to RHEL 7 Veritas cluster (Physical)…  It just keeps growing and won’t die!

Runner-Up:  Level 4 support/developer NFS server: 625TB Allocated, 552TB used - SINGLE logical volume spanning dozens of PVs!

Current beasts of my environment (this is just one of five major sites for our team):

  • 61 TB (80 TB Allocated) CentOS 6.10 NFS Server
  • 5x 9-14 TB MS SQL Server clusters (2x servers each with independent SAN storage - these are physical workloads)
  • 24 TB (40 TB Allocated) CentOS 6.10 NFS Server
  • 26 TB (32 TB Allocated) CentOS 6.10 NFS Server
  • 74 TB (120 TB Allocted) Alma Linux 9 NFS Server (just rebuilt the OS disk and moved data disks from former CentOS 6.10)
  • Single job with over two dozen build servers each with 6-20 TB of workspaces (81 TB used of 200 TB Allocated)
  • 30+ MySQL and related Linux database servers totaling 31 TB
  • Single job with 5 Windows file servers; three of those are 20-24TB (48TB Allocated each
  • And the “beasts” are only 60% of this site’s protected capacity…   Not to mention double-digits PB of object storage!

I have found that with these large file servers, I have to start the jobs out with just one or two of their mount points, then un-”exclude” another mount point for each backup until I get the whole system backing up and get CBT nice and happy.

@ejfarrar That is impressive.

On that 522TB server, with the SINGLE volume. how big are your PV’s? 

What do you size your windows file servers Volumes\disks to as well? 

 

Those are some VERY large servers to back up. Are they accessed pretty heavy or mostly archive data?

 

Now that I am hitting the 100TB range for a few I’m finding smaller vmdk’s at least allows me some concurrency in Veeam for the backups.  My coworker would rather just size the VMDK’s to 64 TB and forget about it lol :) 

Userlevel 7
Badge +8

These impressive numbers gave me a headache but it is very interesting.

I'm curious how you saved the file server r&d? Agent? File Share jobs?
On what type of repo and what retention? Object store my read is good?

How long on the activefull that initiates the backup chain?

Userlevel 2

@ejfarrar That is impressive.

On that 522TB server, with the SINGLE volume. how big are your PV’s? 

Sure - make me go look!   - The 552TB (522 typo?), the VG for that LV has 40x 16TB (640TB) for that particular volume plus a second VG of 4x 64TB PVs for a 256TB LV…   So my numbers were off on that physical server….  I didn’t build that server personally so I try not to mess with it outside of backups...

What do you size your windows file servers Volumes\disks to as well? 

These Windows file servers have various disk sizes - each share is its own VMDK based on approved project requirements for each of those teams.  This is an AD-integrated set of servers with strict retention policies…  Most volumes are at most 2TB but a couple of the broad-audience volumes are- 8TB.  I do guest filesystem indexing on these (most of my systems do except for the beasts).

Those are some VERY large servers to back up. Are they accessed pretty heavy or mostly archive data?

If it is “big” it is heavily used in my world.  The test archive server for R&D gets hit heavy during automated testing overnight and on weekends with writes, and mostly reads during the day.  It’s kind of like a yo-yo; anywhere from 6 to 40TB gets written on any given night, and every week 40-200TB gets cleaned up.

Now that I am hitting the 100TB range for a few I’m finding smaller vmdk’s at least allows me some concurrency in Veeam for the backups.  My coworker would rather just size the VMDK’s to 64 TB and forget about it lol :) 

There are advantages and disadvantages to each method.  Active Full backups really take hits on performance (regardless of storage-integrated or network-based transfers) on the larger VMDKs.  The arrays have a hard time cleaning up and compacting them, but at the same time an array using dedupe/compression can make use of large volumes in those aspects.

As far as VEEAM is concerned, my experience is that smaller VMDKs are going to give you the best performance during backups and recovery. I’m not an array or SAN expert, but the arrays behind my protected systems as well as my repositories do just fine with whatever size volumes are presented but cleanup/reclaim/defrag/compacting is obviously quicker on smaller volumes.  I’ve been pushing VEEAM B&R to find breaking points since version 7…  One thing that has always made me reconsider using  single large VMDKs is that if you ever need to do a recovery from storage snapshots, you’re only going to get the VMDK of the first hard drive on a VM easily - anything more than that and you are in for a lengthy manual process… If you have compute/storage capacity, sandbox your scenarios and test it out - VEEAM can do what you need it to; you just have to iron out the details or your one-off’s (hopefully not under pressure)...

 

Comment