The Word of Gostev - Hardware for Primary Backup Repositories

2 years ago
13 September 2021
3 comments
431 views

Userlevel 7

+17

JMeixner
Veeam Vanguard
2575 comments

This week Anton Gostev talks about hardware for primary backup repositories. He discusses what kind of servers are used in the market and which advantages they have.

Really a must-read.

I've recently talked to a number of our field folks regarding primary backup repositories they see our customers deploying these days and collected quite a bit of factual data that I think is worth sharing, as it should be helpful for your planning purposes. I also included some anecdotal information about the pricing just to give you an idea but keep in mind this will the price to channel without reseller margin (so you will pay slightly more) and this data came from one particular cost-sensitive region – so YMMV depending on where your company is located. Also, while I will be talking about HPE and Cisco offerings simply because these are "safe choices" used by the majority of our customers (both companies are our resell partners which explains this phenomena) this does not mean other vendors should be avoided. For example, some of you may be using DELL hardware exclusively in your data center, while other may like the no-thrills approach of Supermicro to save some $$$. Just be sure your build includes proper enterprise-grade RAID controller, as this is really not something you should skimp on.

As you know, for primary repositories Veeam recommends general-purpose servers to customers of all sizes including the largest. This is because with storage-oriented servers these days you can get upwards to 1PB of capacity in a single box. Smaller customers typically go with 2U servers containing 12 or 24 LFF (large form factor) drives, while larger customers with 4U 56-60 LFF configurations. So, let's look closely at both of these configurations.

Some examples of 2U small/medium capacity servers include Cisco C240 and HPE DL380. For these, a RAID6 configuration is typically used, making them go up to 180TB of usable capacity depending on hard drive size you choose. And if you need a bit more capacity in the same form factor, you can also look at ultra-dense 24 LFF systems like HPE Apollo 4100 for up to 360TB config at double the price. Remember that you always want to have all drives populated, as the number of spindles is where the IOPS capacity will come from, which is just as important as storage capacity. So basically you need to select the drive size according to your total capacity needs. For example, with 4TB drives in 12 LFF server will get you 40TB of usable capacity.

The 180TB usable configuration costs around USD 20K to channel. Assuming you deploy ReFS or XFS based repository to be able to leverage block cloning, this config directly competes for example with DELL EMC Data Domain DD4200 120TB usable, which costs around 250K to channel - or over 10x more! Not to mention a major performance impact (up to a few times depending on the operation) and a huge difference in the required rack space (2U vs. 9U). This basically explains why in my opinion, inline deduplicating storage appliances are rarely a good candidate for the primary backup repository and should only be considered for secondary repositories (backup copies) when a very long-term retention policy is in play.

Next, let's look at bigger servers. Some examples of 4U large capacity servers are Cisco S3260 and HPE Apollo 4510. Most of our customers deploy these in RAID 60 with a few hot spares. Fully stuffed with 18TB NL-SAS drives these beasts will get you 864TB of usable capacity and some insane performance numbers (so long as your primary storage and SAN fabric can keep up) all for just around USD 100K. These work exceptionally well and we've been confidently recommending them as a storage solution to our largest customers, because even many years ago we already had Cloud Connect service providers hosting PBs of their clients' data on previous generations of these same boxes.

But taking in consideration customer sizes we're talking with these servers, it would be unfair not to mention one possible objection to this approach: the potential of a storage controller failure. Now, no one I've talked to have seen a failed controller in the storage server once, and all concur they don't really know anybody who worries too much about this happening. However, ironically one of our product managers have actually observed this concern once! He was recently giving some advice to his friend shopping for a new backup storage with 780TB of usable capacity. The quotes from the partner for a bunch of different HPE storage – namely Apollo, Nimble and StoreOnce were all within USD 100-130K, so not a huge spread in cost. And while our guy has strongly recommended Apollo, in the end this customer still picked Nimble specifically because of controller redundancy – even if going the Apollo route would have saved them USD 30K. Apparently, sleeping a bit better at night may cost quite a bit!

Here I do have to note that I heard Nimble mentioned as a backup target one other time in my discussions. A person in our EMEA field told me, quoting: "Nimble is the new StoreOnce in some [EMEA] countries. Dedupe + Speed = Happy Customer!" which I thought was an interesting observation. I don't believe HPE ever intended to position Nimble as a secondary storage, but looks like the combination of its price/performance/capacity has made the market see this use case too! Although I don't know if I personally would have been able to justify the price difference vs. say the above-mentioned 4510. Even just because Apollo gives me everything I need for an all-in-one backup appliance, while going the SAN route still requires a backup server in addition, which makes the price and rack footprint difference even more noticeable.