Solved

Deduplication appliance as primary backup repository Yes or No?🤔 What you think?


Userlevel 7
Badge +6
  • On the path to Greatness
  • 151 comments

Here in South America is very common to find deduplication appliance without / with integration as primary backup repository...

I would like to know what you think … :bulb:

icon

Best answer by haslund 3 November 2020, 08:17

View original

17 comments

Userlevel 7
Badge +14

In my experience the customers rarely fully understand the implications of this. The first conversation should be what are your RPO and RTO requirements and then lets perform some proof of concepts.

Userlevel 7
Badge +14

Restore speeds are definitely my primary concern, but depending on the vendor and model there can also be some unexpected restrictions on backup chain length (meaning how many restore points you can have without performing a new full) - fx if you are someone who is performing multiple backups per day.

When performing daily backups, what are we really backing up? Incrementals = new data from yesterday. In reality, how much new data is being produced on a file server that will be identical to new data on the Oracle database server?

The main benefit of deduplication is eliminating similar blocks of data and having many full backups stored on it, typically for the purposes of long-term retention, will work great.

Userlevel 7
Badge +6

In my experience the customers rarely fully understand the implications of this. The first conversation should be what are your RPO and RTO requirements and then lets perform some proof of concepts.

Restore speeds are definitely my primary concern, but depending on the vendor and model there can also be some unexpected restrictions on backup chain length (meaning how many restore points you can have without performing a new full) - fx if you are someone who is performing multiple backups per day.

When performing daily backups, what are we really backing up? Incrementals = new data from yesterday. In reality, how much new data is being produced on a file server that will be identical to new data on the Oracle database server?

The main benefit of deduplication is eliminating similar blocks of data and having many full backups stored on it, typically for the purposes of long-term retention, will work great.

 

Totally agree, the customer think about the deduplication factor regardless of the speed of restore,
the customer need to consider that Instant VM Recovery might not be as fast as expected, among other things…

In the case, if I should choose a dedupe appliance as principal backup repository, I like choose Exagrid… as SOBR, No need Gateway Server and The Landing Zone is very useful for all operation of tranformation...

 

Userlevel 7
Badge +13

In my opinion a commodity server with local/remote disks with ReFS/XFS formatted volumes is a very good and solid configuration. … if not the better one. You have a quite good deduplication from Veeam and a good restore/instant recovery performance. Yes, deduplication appliances have better dedup-rates, but mainly if you do just full backups. 

Userlevel 7
Badge +14

It depends on the customer’s requirements (RPO/RTO) but I personally would never use a dedupe appliance as the primary repository. Restore performance can/will decrease over time and the restore time will get unpredictable; doing a DR wouldn’t be much fun. Thinking about the costs of a dedupe appliance, there should be enough budget for a smaller and faster primary backup storage.

On the other hand if you have a landing zone or something similar thing could be different. There has been a discussion recently in the forums about this topic: https://forums.veeam.com/veeam-backup-replication-f2/primary-backup-on-dedup-without-landing-zone-really-t68501.html

Userlevel 7
Badge +20

I would say no, and my reason in one word would be: SureBackup.

Unless your deduplicating appliance has a “landing zone”, then reading data requires the deduplication appliance to rehydrate the data which has a massive impact to performance in any recovery operation, far more performance impact than reading directly from a traditional DAS with Veeam’s normal compression & deduplication settings, then don’t forget as well when we get noisy VMs like when they were backed up pending a reboot for Windows Updates and start trying to finalise a patch installation…

Veeam themselves recommend deduplication appliances mainly as secondary devices though they do say they still support it:

https://bp.veeam.com/vbr/VBP/3_Build_structures/B_Veeam_Components/B_backup_repositories/deduplication.html

Userlevel 4
Badge

I always get best result from Veeam and there was no need to appliance or using storage hardware deduplication feature. I agree with MicoolPaul, you can use appliance to keep backups as long time retention.

If you want to keep backup files on appliance or proxy server with OS deduplication feature, you need to change default/optimal compression and deduplication settings for repositories or backup job.

Userlevel 3
Badge +3

Yes for backup copys but not as primary Repo. Mostly because you loose ability for instant recovery and more features based on that.

Userlevel 1

I think it depends on the use case.  If its purely as a SOBR backup repository then you can see the appeal.  If you designing and consider complete backup and recovery, then its a totally different consideration.  The cornerstone of Veeam success is the ability to recover quickly and this could be the difference as to whether a de-dup appliance is the right fit.  Another factor to consider is the overall data volume in question, for larger environments, the greater dedup ratios will be potentially attractive.

Userlevel 1
Badge

We have clients that use de-dupe appliance as a primary repository and other that use disk formatted ReFS and I must say that I do not see any difference in recovery performance. There are restrictions using either scenarios..in that the storage that is used for the ReFS should be outside the storage where the originating data is, so that the data is protected.  In both cases you have to have storage outside your data storage solution. So .. when your customer has lots of data then the difference in cost become less, as the processing needed to manage the de-duplication filesystem grows. The de-duplication device then has the advantage that it is singular in purpose. They do de-duplication better and therefor become more and more cost effective as the amount of data gets bigger. In large organizations I do not see that you will get away with a Windows with ReFS repository

Userlevel 7
Badge +14

We have clients that use de-dupe appliance as a primary repository and other that use disk formatted ReFS and I must say that I do not see any difference in recovery performance. There are restrictions using either scenarios..in that the storage that is used for the ReFS should be outside the storage where the originating data is, so that the data is protected.  In both cases you have to have storage outside your data storage solution. So .. when your customer has lots of data then the difference in cost become less, as the processing needed to manage the de-duplication filesystem grows. The de-duplication device then has the advantage that it is singular in purpose. They do de-duplication better and therefor become more and more cost effective as the amount of data gets bigger. In large organizations I do not see that you will get away with a Windows with ReFS repository

Very interesting read, thank you for posting your insights. You should never use your production storage for storing your backups, so I don’t really agree that is a restriction for ReFS (or XFS or any other storage system).

When the customers have a lot of data, that does not necessarily also mean they have a lot of duplicated data? For example, how much data would really be duplicated across your Microsoft Exchange Server and Oracle Database server? Some operation system blocks perhaps?

Personally, I think deduplication appliances make the most sense when you are going down a path of having GFS enabled for long term storage of identical full backups, if you are having just one full and incremental then what would the real benefit be?

Userlevel 7
Badge +8

I aggree with @haslund , i think it depends of the scenario but from point of view. I did it recently the average cost of TB is cheaper with standard server. Even if you take SSD for better performances… I prefer to have storage agnostic like my software to be able to industrialize to a large scale and scope.

It’s faster to power on and provision a new server automatically than add a storage array with a dedup appliance etc...

Userlevel 7
Badge +7

Deduplication appliance is all nice as long as you write the backups and the deduplication ratio allows you to store a large amount of backups. Unfortunately when you need to perform a quick restore from a deduplication appliance it's a pain. imho Dedup appl only for long retention is good.

edit:

p.s. Obviously it depends on the scenario and the RTOs and RPOs

Userlevel 1

 

I would say no, and my reason in one word would be: SureBackup.Unless your deduplicating appliance has a “landing zone”, then reading data requires the deduplication appliance to rehydrate the data which has a massive impact to performance in any recovery operation, far more performance impact than reading directly from a traditional DAS with Veeam’s normal compression & deduplication

Landing zones aren’t necessarily faster than reading direct from dedupe.

Quantum's DXi meets the restore performance requirements, and also supports Fast Clone for faster synthetic full creation.

Read probably won’t be as fast as a similarly sized primary storage partition, (YMMV), but is likely to be as fast or faster than a “landing zone” tier, with the added advantages of Fast Clone support and no performance drop-off when the backup in question has left the landing zone, and all the space available for dedupe.

Userlevel 7
Badge +20

 

I would say no, and my reason in one word would be: SureBackup.Unless your deduplicating appliance has a “landing zone”, then reading data requires the deduplication appliance to rehydrate the data which has a massive impact to performance in any recovery operation, far more performance impact than reading directly from a traditional DAS with Veeam’s normal compression & deduplication

Landing zones aren’t necessarily faster than reading direct from dedupe.

Quantum's DXi meets the restore performance requirements, and also supports Fast Clone for faster synthetic full creation.

Read probably won’t be as fast as a similarly sized primary storage partition, (YMMV), but is likely to be as fast or faster than a “landing zone” tier, with the added advantages of Fast Clone support and no performance drop-off when the backup in question has left the landing zone, and all the space available for dedupe.

You make a good point, having not used Quantum I can’t comment on the performance of it specifically but it like everything depends on customer budget and vendor preference as there’s huge variety of features between deduce appliances that my recommendation wouldn’t be to default to a dedupe appliance as my primary storage. Are you using Quantum? How do you find it if so?

Userlevel 7
Badge +13

 

I would say no, and my reason in one word would be: SureBackup.Unless your deduplicating appliance has a “landing zone”, then reading data requires the deduplication appliance to rehydrate the data which has a massive impact to performance in any recovery operation, far more performance impact than reading directly from a traditional DAS with Veeam’s normal compression & deduplication

Landing zones aren’t necessarily faster than reading direct from dedupe.

Quantum's DXi meets the restore performance requirements, and also supports Fast Clone for faster synthetic full creation.

Read probably won’t be as fast as a similarly sized primary storage partition, (YMMV), but is likely to be as fast or faster than a “landing zone” tier, with the added advantages of Fast Clone support and no performance drop-off when the backup in question has left the landing zone, and all the space available for dedupe.

You make a good point, having not used Quantum I can’t comment on the performance of it specifically but it like everything depends on customer budget and vendor preference as there’s huge variety of features between deduce appliances that my recommendation wouldn’t be to default to a dedupe appliance as my primary storage. Are you using Quantum? How do you find it if so?

I agree, it depends :grin:

In my experience, dedup-appliances are considerable more expensive then general propose hardware. And there are not that much more benefits. But also this depends. Could absolutely be, there are appliances out there with comparable price and more benefits. Is Qunatum one of these?

Userlevel 7
Badge +8

I have quantum DXi and libaryry I6000, i3.

I can be critical, i think. It depend of your budgets, your knowledge, your team.

Like @vNote42 said it’s more more expensive…

For my POV, i prefer to have the total contral of my cost and delivery. Now i have to chose, i will go to server with large storage capacity.

If you don’t have people or knowledge Dedup appliance could be a great choice but you should have enough budget.

Comment