Solved

Fast Recovery storage options - recommendations


Userlevel 3

I have a v11a server backing up @80 servers to an on-site ALL FLASH ARRAY connected at 40Gbps. Does a nightly job without any issues. But it has a 7 day retention policy and then we copy out to a DROBO appliance which is connected at 1Gbps. Since it is only doing the copy job work, no problem.

We recently needed to do some file restores and found out that the restore speeds were very slow off of that device @ 10Mbps which is pretty much unusable. So my question to the community is this. What would be a good device / configuration to be able to keep my GFS , older, backups on locally that would offer a reasonable cost / capacity / data protection? I looked into using the S3 cloud as well and the upload speeds were terrible for me, like 8Mbps and download at 75Mbps. If I have to restore  500GB of files that would be way to long.  Almost al of my servers are Virtual in VMWare running in a CISCO UCS Chassis connected at 10Gbps to the switch. What are the success story configurations? I look forward to your recommendations and experiences.

icon

Best answer by MicoolPaul 11 April 2022, 16:57

View original

11 comments

Userlevel 7
Badge +20

Hey, so your DROBO looks to be a NAS, are you connecting to this via iSCSI? Or CIFS/NFS?

 

I’d suggest using a Veeam hardened repository to create an immutable backup set, using directly attached storage to the server. If you don’t need a huge number of disks, something like the Dell NX storage servers could be handy, they’re based on the Dell R740XDs IIRC.

 

Alternatively you could look at using an S3 storage that isn’t cloud based if your uploads are slow. Something like a Cloudian? As your uploads are bad, what’s your off-site backup strategy? Have you got any tape or USB etc to ship data off site?

 

There’s countless solutions and it’ll all depend on budget of course, but a lower end server with large capacity spinning disks, and a decent RAID controller of course, could still easily have a 10Gbps+ NIC for rapid connectivity.

 

A couple of years ago I had a customer that wanted to buy a 10Gbps Synology and fill it with WD Red Pro disks, that would’ve been using NFS/CIFS or iSCSI and the underlying network was 1Gbps so it would’ve been heavily under performing, plus couldn’t host the roles itself so would need either a gateway server or an iSCSI VM to talk to the storage, a beefy Dell server with storage came in cheaper, with the added benefit of 4 hour response for parts if a hard drive failed!

 

Hope this gives some food for thought to start!

Userlevel 7
Badge +17

I would go for:

  • a disk repository on a on-premise server. Preferable with a linux OS to build a hardened repository.
  • an on-premise object storage.
Userlevel 3

Yes, The DROBO is a CIFS device. It has hardware data protection built in so hd failure is not an issue. But it is SLOW. I like the idea of a Hardened LINUX S3 box. Can I build it on a DELL server? Run it on RedHat  or ? “other distro?”. Is the S3 compatible Storage thing an open source plug in?  I like the idea of 10G direct connected, ISCI? I’ll have to lookup  Cloudian, to understand more about that. 

*Dave

Userlevel 7
Badge +20

There’s a few things to unpack from your last post so let’s break these out:
 

There’s “Hardened Linux Backup Repository” which isn’t object storage/S3 compatible, but instead it’s block storage, similar to your CIFS share. You can look to deploy object storage on a generic bit of tin, Dell or otherwise, there’s a few options for this.

With object storage, Veeam maintain an official supported storage list, available here: https://www.veeam.com/alliance-partner-integrations-qualifications.html?type=object-storage-target

 

I’m not aware of any solutions on there that don’t cost money, one way or another, either via licensing/support contracts or in the sale of a hardware appliance. If you wanted to reuse hardware, you could look at MinIO, Red Hat CEPH or SUSE Enterprise Storage. IIRC remember correctly they’re all pretty hardware agnostic.

 

Focusing on your DROBO, CIFS isn’t the best idea. It’s the protocol most likely to end up with corruption due to write caching and poor implementation with a lot of vendors. Generally when you’d deploy enterprise grade storage over a network such as a SAN, you’d have a dedicated network switch stack for your connectivity, to avoid noisy neighbours amongst other benefits, with the alternative being the use of QoS in a shared network switching environment. People tend not to do this with CIFS/NFS in my experience and this can further your problems when working with these protocols.

iSCSI is a bit harder to set up the first time, but as you can leverage features such as MPIO, if your DROBO has multiple 1Gbps NICs, you could utilise them all for greater throughput. You could also then use a “proper” file system that your Host OS could take advantage of such as ReFS for better data efficiency via the use of Fast Clone. But none of this is a substitute to a proper server with Direct Attached Storage for your block storage.

 

If you’re new to object storage and want to understand what’s going on a bit better, my blog series on this might be worth a read: 

With all this said and done, it sounds like CIFS is your bottleneck, so check the available settings for multiple streams to see if that can help improve performance, whilst you plan to upgrade to a higher performance solution, whether block or object based.

Userlevel 3

Michael, Thanks for your reply. I am not looking for CHEAP storage or FREE storage. But the best reasonable alternatives. I don’t understand how there are people out there using CLOUD storage in the enterprise. I have 30TB in exchange alone and another 40TB in assorted data and systems. I don’t think there is a pipe big enough to get me into the cloud storage that could keep up with the number of hours in the day and our data change rate. That is why I am looking at an ON-SITE, direct connected solution. We currently have a PURE ALL FLASH ARRAY connected at 40Gbps directly connected into our switch fabric that connects to our UCS Chassis an our HPE ARRAY. All ISCSI. We use the PURE as our target for our SOBR on VEEAM v11a. Our performance is pretty good, and it has an IMMUTABLE safe copy feature as well that isolates (2 copies) of our backup volumes in a special safe place to keep them from being corrupted. The issue I am trying to solve is that I don’t have enough capacity on the PURE to get more backup days stored. Right now I have a 7 day retention. The PURE keeps the latest 2 in a SAFE region. After that I scale out to the DROBO (I get it - bad idea).  One option would be to expand the PURE ARRAY, which is do-able. I have room in the chassis to double its capacity. But It would probably cost me 100,000.00 more. It is installed as a SAAS contract for 3 years.  I also have a NIMBLE array that I do BLOCK REPLICATION of this BLOCK STORAGE device  nightly to another site.  So that is my OTHER site copy. But it is NOT Veeam backups. It is a replica of my production Array’s volumes. So my FAST restore option is only good for 7 days due to capacity. Then my VEEAM goes to my DROBO to get older data. It is painfully slow at data rates around 10Mbps avg. We do have 2 nics at 1Gb in the DROBO to try to increase the bandwidth. It does help some, but still not a solution. So maybe I’m looking for a UNICORN. But I can’t be the only one trying to get reasonable performance at a reasonable price. Or maybe 100,000.00 is a reasonable price, I just haven’t got that yet? The SLOW restore issue is on FILE restores. I was thinking maybe it would be better to do a FULL MACHINE INSTANT RECOVERY into the sand box and then copy the files we need out of there instead of waiting for VEEAM to pull them from the repository? And thought on this idea?  Or maybe I should just ask what is the best/fastest way to restore FILES from a Veeam Backup job?

 

Userlevel 7
Badge +20

Hi,

 

I think crucially it’s down to what bandwidth you have available for your budget. For example, your 30TB Exchange and 40TB in other data, assuming a 2:1 reduction ratio is 15TB and 20TB respectively, and quite achievable. It’s worth remembering this is for the initial seed / full restore. It’s also worth remembering that when recovering from object storage, only data not present within your performance tier datastores is downloaded, not everything for the sake of it.

 

So in your scenario:

15TB Exchange & 20TB Other Data = 35TB /36,700,160 initial seed.

 

First lets calculate the incremental, as this is your day to day utilisation metrics and lets us know if it’s sustainable for you as a backup solution:

Let’s assume a 10% daily change to your data that then needs uploading: 3.5TB / 3,670,016MB

Now if we assume your backups are daily and you have an 8 hour backup window, if we assume that entire 8 hour window is saturated for your backups (likely to be far less than this but ensuring we consider worst case scenarios). That leaves us with 16 hours in the day to offload the data.

If we want to complete within this window, we need a minimum throughput of 63.71MBps (3,670,016 divided by 57,600 seconds = minimum consistent throughput). That’s a connection speed of just over 500Mbps (approx 509.68Mbps).

We’d then need to consider day to day traffic layered over the top of that connection if it’s a shared connection too, which falls outside the scope of what I’m detailing here. But on a 1Gbps circuit (normal throughput on this will realistically be between 850-950Mbps) you’d be looking at 8 hours to upload, certainly achievable.

 

This still leaves the elephant in the room, what about that initial upload? If we consider a 1Gbps connection uploading 24x7, with an effective throughput of 900Mbps / 112.5MBps, to upload 36,700,160 will take approximately 90.61 Hours / 3.77 Days

 

There are options available to ship data via a data drive to these public cloud vendors to ingest the data that way when faster, so if a connection speed less than 1Gbps was available, that can work around your bottleneck. But then we still have the recovery layer of this, what RTO are you trying to achieve when recovering your data?

 

I don’t know the size of your organisation, but if you have 30TB of Exchange data that is all active use / short term retention, you’re a massive organisation that hopefully can afford 1/10Gbps connectivity to afford rapid upload and retrieval of data. If that data includes archival, I’d suggest if you’re licensed, looking at utilising Archive Databases. It may then become a scenario that your org would be happy with for DR perspective, to have a much smaller (and far faster to recover) Exchange Database with say 3TB of data, focusing on live mailboxes and current mailboxes, with the other 27TB of data on one or more Exchange Servers that takes a few days to recover.

 

Without knowing your topology, I can’t recommend specifics to your environment, but initially it does look like you’ve got a lot of data, either by necessity or because it needs a clean, and a WAN connection with low throughput.

 

Finally onto the point of enterprises, I’ve certainly seen plenty with 1/10Gbps+ connections uploading far larger quantities of data, with RTOs that suit. I’ve also seen sites connected with Dark Fibre at far higher speeds and delivering off-site backups that way.

 

With your data set size, you could get away with a Dell R740XD2, stacked full of 18TB disks (IIRC it supports 18TB now, might be 16TB), it can house up to 26 3.5” drives, assuming you just wanted a RAID6, no hot spare, that’s 432TB RAW capacity approx, and definitely costs FAR FAR less than the 100k figures you’re thinking for Nimble/Pure. You’d configure as a Linux hardened repository for immutability, give it 10/25/40Gbps network connectivity, and you’d certainly have fantastic read performance, or could slice up the RAID6 into multiple RAID6’s and create a RAID60 for even better write performance.

 

In summary, what you want isn’t a unicorn, it’s what a lot of people are achieving right now, in a variety of methods. Take a step back, look at what’s out there and ask any follow up questions you may have 😊

Userlevel 3

Michael, Thanks for reply and details and patience. Let me digest it for a bit. I figured that many other organizations were doing it somehow. We only have a 50MB internet pipe as most of our company work is on our internal WAN. We don’t use much of PUBLIC services like salesforce, etc. So we don’t have a fat pipe out or back from the public internet.

Userlevel 7
Badge +20

Michael, Thanks for reply and details and patience. Let me digest it for a bit. I figured that many other organizations were doing it somehow. We only have a 50MB internet pipe as most of our company work is on our internal WAN. We don’t use much of PUBLIC services like salesforce, etc. So we don’t have a fat pipe out or back from the public internet.

You’re very welcome. Please come back with any further questions you may have. Hopefully we’ll get a way forward for you out of this.

Userlevel 3

Michael, I got on the web and tried to BUILD that DELL server with the specs that minio lists. So, 2 Gold processors, 128GB system Ram, 8 SSD drives (suggested minimum) and it is up around $ 80K. That is still pretty $$$$. Am I doing it wrong, or is that about right?

The “Other” thing I have to deal with is how do I do a TEST of this to make sure it does what I hope it will? If I can’t prove it will be a Doozy to management, I’ll have a hard time getting them to pay for it.

Is there a way I could “RENT” a system like it for a test, or what does the community suggest?

 

Userlevel 7
Badge +17

Which SSDs have you configured?

Userlevel 7
Badge +20

@NaplesDave is that the RRP that nobody pays? 😉 

 

Honest answer time though:

Yep, there are plenty of companies that let you lease servers, if you’re a Dell partner they should be able to help with sizing or whoever your reseller is should be able to help otherwise to mitigate some of the risk.

 

If you’re happy to go hardened Linux but not object you could spec a Dell server with far less grunt. Object storage requires a fair amount of resources for the database tracking the objects. Vs using a dell server as just block storage. Then you could have just a BOSS card (RAID 1 NVMe 256/512GB) for your OS, with a decent RAID controller with cache + multiple stripes of RAID 6 for some high performance IO.

 

With SSDs you’d likely be fine with read optimised which is a lot cheaper, as you’re not working with an active production set of data, but instead periodic IO, assuming you need SSDs. Understand you may want to have a high performance smaller tier though of course.

 

Using just block storage you could easily drop those processors to Intel Xeon Silvers in most scenarios

Comment