Let's talk storage.

  • 20 September 2023
  • 8 comments
  • 186 views

Userlevel 7
Badge +8

I’m in the market for new storage at 2 sites. I’m hoping to get some input from the community here if any of you have a lot of data, or fast storage.

 

Requirements - You need to be able to hit a minimum of 2GB/s on your Veeam jobs to post your solution. I want you to detail your proxy, repo, and storage setup as well. (This is not a competition though)

 

Currently I have 2 V7000’s. I can hit about 2.2GB/s sending data to 6 LTO8 drives. I am hitting about 6k IOPS but the latency starts to increase at this point.  I’ve been happy with this storage, (7200RPM disks) but it’s time to upgrade, and I am looking at a few options.

 

Option 1) Exagrid at 2 sites. 10+ EX84 per site 

   - Does anyone here own Exagrid storage? What is your performance like, how about long term restores? The price is decent, I just have concerns about a large scale instant restore, synthetic operations, merges etc.  (Those are my current pain points with my current v7000s It’s fast untill I run too many at once)

 

Option 2) Pure storage at one site, Exagrid at the second.

  - More money, but should out preform the Exagid. Looking for people who own Pure or Exa to jump in here.

 - Can’t use the Exagrid replication, Would have to use a copy job between sites. (currently running this way anyways)

  - Second site is for DR, rarely used. This solution allows for long term at the DR site and fast at the main site.

  - Still need to get faster speeds as we will start adding a few more tape drives soon and dropping down seems counter intuative.

  - The Pure FlashBlade S200 is something that I am looking at.   Also, seeing that we are a Fiber Channel storage shop, looking at a Flash Array C, and an IBM as a fallback option. 


8 comments

Userlevel 7
Badge +20

We only have a few Pure arrays - never used ExaGrid but heard good things.  @falkob should be able to chime in here with some great advice now he is at Pure.  😎

Userlevel 7
Badge +20

Hey @Scott you haven’t mentioned the data size or retention you’re trying to keep here which would be good to include, but here goes:

2x HPE Apollo 4510 Gen 10

Acting as AIO for proxy & repo.

Boot is RAID 1 SSD, used for OS, Veeam Components

58x SAS 7.2k RPM disks per server. This was to enable 1 year of on-prem retention, delivered via 2x RAID 60 per server (HPE Recommendation)

RAID Controllers: 2x P408i-P, 29 disks per controller

Network: 2x 40GbE

FC: 2x 32Gb

 

We’re leveraging storage snapshots primarily against NetApp and can consume up to 4GBps (2GBps per node when the jobs balance perfectly)

The NetApp all-flash storage is on 16Gb FC ports so you can get a sense for how hard these servers push it, but as it’s not flash being used by the Apollos, your IO is naturally more sensitive to dips.

 

If you don’t need full GFS retention in your primary backup, I’d go for flash storage definitely, but you can’t cut corners with networking or it’ll quickly bottleneck you, the job has to scale out well to consume multiple nodes or communicate with multiple hosts etc.

Userlevel 7
Badge +8

Our retention isn’t too bad, but our data size is pretty large… looking to backup about 700TB right now but it’s always growing… I have windows boxes as proxy/repo/tape combo’s and that is going to get split. I need dedicated tape servers, and will maybe split the others too.

4GB is decent.  I have 32Gb fiber, but my networking is limited to 25 at the moment.

 

Our IBM FS7200 is 32Gb for the back end, and the other v7k’s are getting replaced with 32 as well. 

 

I think the exagrid should be fine, I just want to make sure it’s not going to be painfully slow. If I add a few tape drives I know it’s going to end up around 3GB/s.   Plus I’d like to have larger jobs and not have to hold back on the number of tasks before things get slow.

 

Either way for production I’ll be leaning to Pure in the future for some of our arrays. They seem like great products. 

Userlevel 7
Badge +20

Yeah Pure storage is really nice, you’ll certainly get a lot of performance out of them, but as you’re splitting out proxy , repository, and tape the cross talk between components becomes more and more crucial to monitor and ensure you’ve got sufficient bandwidth to keep all components working as hard as necessary to achieve your backup windows. Thankfully with 25GbE and multipathing/NIC teaming what you’re trying to reach is easily achievable.

 

I spoke to a company recently using flashblades and they were very happy with the performance. Pure even have a whitepaper using proxy to Flashblade over NFS and they were still hitting 3-4GBps

 

https://www.purestorage.com/content/dam/pdf/en/white-papers/wp-rapid-restore-flashblade-veeam.pdf#page7

Userlevel 7
Badge +8

Source storage 2-3PB (SAN, VSAN, huge standalone physical servers)

SAN Proxies on Linux RH 7/8 at the moment.

Many HPE Apollo 4510 Gen 10

Acting as VHR repo, on Hardened RHEL 8;

Boot is RAID 1 SSD, used for OS, Veeam Components

60x SAS 7.2k RPM disks 16TB (could 18TB now) per server.  2x RAID 60 per server (HPE Recommendation) so 2 extents member of the SOBR.

RAID Controllers: 2x P408i-P, 30 disks per controller

Network: 4x 25Gb

Raid Cache 90% write, 10% read

Test: FIO and diskpd for linux (Veeam conf) around per 2GB/S per RAID on read/write.
average everyday processing rate 3GB/S on SAN source backup jobs of 1k VMs. First ActiveFull post V12 migration and upgrade backup format was around 6GB/S on a SOBR with 2 HPE Apollo GEN 4510 (4 extents).
Restore performances around 800MB/s per VMDK.
I’m sending terabytes on tape everyday, i can say with v12 i’m an happy guy with a processing rate around 2GB/s linear par tape server. Long life v12 new tape engine and true-per vm was not the same in the previous version.

I discussed a lot of this subject with @Eric Machabert , all in box (san and proxy) could be great even on linux. As we talked, we noticed that performance wasn't that different either if the roles were separated, and personally I prefer to have the VHR for my performance tier.

If you have time and the team to do it, i will probably suggested to use density server (Hpe, Dell,cisco etc). If you want huge performance and have the money for it, you can have huge density server with SSD (why not?!) You can control all things from your repos from the hardware to the OS. When you did the automation for provisioning once (using foreman and ansible for example) add storage become easy, you’re managing a cattle not a pet.

I had to study the market to replace my old dedup appliances, i can say me and my colleagues are very satisfied by the choice we did. Specially when the price of the TB was 4-6 times cheaper for this choice than other competitors. For a disclaimer i’m a customer and i have nothing to win to promote you a product or another.

For your study case, it could be different i mean if my memories are good, you have lots videos and filers. Dedup appliances could have more senses even i will be curious how veeam dedup inline handle it, maybe it’s enough.
Block cloning from XFS is fast, reliable and stable.

From my POV, V12 has opened a new storage paradigm with direct object storage. In your case i will study this specially if you don’t have any workload which not supported on this. I don’t have enough experience on it, but i really love the possibility from solution to leverage fail-over and load balacing for the backup storage and it will be transparent from the Veeam Side.

Userlevel 3
Badge +3

As @BertrandFR said in the previous comment, we speak a lot about backup performance as we suffer from PTSD about low performance.

On my side (HPE shop here):

Source storage:

  • multiple Primera SSD (650, 670), multiple Alletra NVME (9060 or 9080), dual site peer persistence. 
  • Level 1 backup (30 days retention, optimized for performance, on site on each DC): multiple Apollo 4200 (224TB usable each), doing Proxy+Repository as per Veeam BP. Some using Windows 2019 other using RHEL9. Everything use backup from Storage Snapshots
  • Level 2 backup ( 14 days, immutable, offsite, backup copy job): multiple Apollo 4200, ethernet only
  • Level 3 backup (5 to 10 year, offsite, backup copy job): Storeonce, ethernet Catalyst.
  • Level 4 backup (very long term, offsite): LTO 8 & 9

 

SAN Connectivity: 32Gb, dual fabric, ISL 4x32Gb AES encrypted

Ethernet Connectivity: Apollo 2x25Gb, ESXi 2x50Gb, ISL 4x40Gb/s, MACSEC AES

 

Backup performance : 46gbit/s at level1 per proxy/repo(out of 64gb/s FC)

Restore performance: 500 to 950 MB/s per VMDK. Direct San restore (eagger zeroed thick to avoid any performance impact due to write ack on vcenter). For Ethernet based restore, using hot-add appliance, quite the same speed as Direct SAN.

 

As per Veeam BP, you should go with flat storage at level1 (dense servers) and deduplicating appliance for mid/long term retention.

 

 

 

 

Userlevel 7
Badge +20

As an alternative to deduplication, you could look at object storage. I was having a really good chat with VAST Data last week, as they offer object storage that can perform deduplication functionality on all flash. If you’re already using/expecting to use a deduplication appliance, you won’t be encrypting your backups as they won’t deduplicate, but if you instead leverage object storage, you’ll get far better scale for your backup & restore performance.

 

Could be worth looking into something like this as a ‘next-gen’ dedupe. They even offer using their object storage as an NFS datastore for VMware, one topic we discussed was replicating your VMs to VMware, and then their deduplication would look at identical blocks within your backups and effectively increase it’s storage efficiency, whilst offering a low RTO on recovery. I want to try this final scenario in practice to see how well that dedupe works between replication & backup, but I’m certainly interested!

Userlevel 7
Badge +8

Very interesting comments and thanks for the responses.

I need about 1-1.2 PB at the main site, and perhaps a bit more at the DR site for retention. The Exagrid at the DR site might be a good opportunity for a dedupe appliance. 

For faster restores having non deduped data will be idea, and preferably on more performance based storage.

The FlashBlade X was priced right, but it looks like the FlashBlade S would be more along the performance I need, or the flash array C. 

 

I suppose having my proxies do storage snapshots from the production SANS, and utilizing network to the FlashBlade S that could be a half decent solution.  from there a copy job to an exagrid at the next site to dedupe. 

 

The other option, Is get a 1.2PB FC Flash/SSD SAN, add about 3 servers and assign them about 400TB each, Add the proxy/Repo roles to them with proxy affinity, so that way I can use storage snapshots and have 0 data travel over the network. 

 

That last option might get me some pretty good performance, then copy job the data to a dedupe at our DR site. 

 

The only issue I think will be the price. 

 

 

Comment