@ejfarrar Thanks for that very interesting writeup and response. I agree, management vs performance vs requirements are always a challenge for sizing. I’ve been messing with queue depth and performance a lot lately on some of our analytics VM’s and databases tuning things but there is also something nice about having a larger volume from a management perspective.
As a SAN guy, if you run dedupe too, having large volumes on the back end is beneficial as well.
There really is no right way to do it or right answer, there is however a wrong way when things become unmanageable or the performance is bad.
These impressive numbers gave me a headache but it is very interesting.
I'm curious how you saved the file server r&d? Agent? File Share jobs?
On what type of repo and what retention? Object store my read is good?
How long on the activefull that initiates the backup chain?
That one uses the agent on one of the two nodes of the cluster. My wallet cringes at the new licensing model for file share jobs when I think of my servers (even though it technically isn’t my money, keeping costs down enhances profits which enhances bonuses and salaries). I have moved to purely scale-out repositories for our block systems but we are about to re-assess since our S3 object storage on-prem is about to double in size (its already in double-digit PB)… We are exploring various scenarios where block and object storage are used for all types of workloads…
I have about 800TB of scale-out repository capacity plus about the same on our dedupe appliance where our backup copies go…
All except the >130TB systems are now running on reverse-incremental backups with once a month active full backups. Retention for on-disk restore points is 7 days, backup copy retention is 5x weekly, 3x monthly, 4x quarterly. The exception to that is basically all of the big *nix file servers. They have a 4 day retention policy on disk (offset by 7 day retention + 1 weekly on storage snapshots) and 2x weekly on backup copy. A couple large *nix file servers are under legal retention policy for over a decade, so those ones have some longer backup copy restore points kept.
For big systems that don’t have active full backups frequent enough to minimize the restore points to our policy, a synthetic full backup runs (usually once a week) so we aren’t locked into old full backups which don’t line up with the retention policies..