Skip to main content
Solved

Deduplication appliance as primary backup repository Yes or No?🤔 What you think?


eprieto
Forum|alt.badge.img+7
  • On the path to Greatness
  • 159 comments

Here in South America is very common to find deduplication appliance without / with integration as primary backup repository...

I would like to know what you think … :bulb:

Best answer by haslund

Restore speeds are definitely my primary concern, but depending on the vendor and model there can also be some unexpected restrictions on backup chain length (meaning how many restore points you can have without performing a new full) - fx if you are someone who is performing multiple backups per day.

When performing daily backups, what are we really backing up? Incrementals = new data from yesterday. In reality, how much new data is being produced on a file server that will be identical to new data on the Oracle database server?

The main benefit of deduplication is eliminating similar blocks of data and having many full backups stored on it, typically for the purposes of long-term retention, will work great.

View original
Did this topic help you find an answer to your question?

17 comments

haslund
Forum|alt.badge.img+14
  • Mr. VMCE
  • 391 comments
  • November 3, 2020

In my experience the customers rarely fully understand the implications of this. The first conversation should be what are your RPO and RTO requirements and then lets perform some proof of concepts.


haslund
Forum|alt.badge.img+14
  • Mr. VMCE
  • 391 comments
  • Answer
  • November 3, 2020

Restore speeds are definitely my primary concern, but depending on the vendor and model there can also be some unexpected restrictions on backup chain length (meaning how many restore points you can have without performing a new full) - fx if you are someone who is performing multiple backups per day.

When performing daily backups, what are we really backing up? Incrementals = new data from yesterday. In reality, how much new data is being produced on a file server that will be identical to new data on the Oracle database server?

The main benefit of deduplication is eliminating similar blocks of data and having many full backups stored on it, typically for the purposes of long-term retention, will work great.


eprieto
Forum|alt.badge.img+7
  • Author
  • On the path to Greatness
  • 159 comments
  • November 3, 2020
haslund wrote:

In my experience the customers rarely fully understand the implications of this. The first conversation should be what are your RPO and RTO requirements and then lets perform some proof of concepts.

haslund wrote:

Restore speeds are definitely my primary concern, but depending on the vendor and model there can also be some unexpected restrictions on backup chain length (meaning how many restore points you can have without performing a new full) - fx if you are someone who is performing multiple backups per day.

When performing daily backups, what are we really backing up? Incrementals = new data from yesterday. In reality, how much new data is being produced on a file server that will be identical to new data on the Oracle database server?

The main benefit of deduplication is eliminating similar blocks of data and having many full backups stored on it, typically for the purposes of long-term retention, will work great.

 

Totally agree, the customer think about the deduplication factor regardless of the speed of restore,
the customer need to consider that Instant VM Recovery might not be as fast as expected, among other things…

In the case, if I should choose a dedupe appliance as principal backup repository, I like choose Exagrid… as SOBR, No need Gateway Server and The Landing Zone is very useful for all operation of tranformation...

 


vNote42
Forum|alt.badge.img+13
  • On the path to Greatness
  • 1246 comments
  • November 4, 2020

In my opinion a commodity server with local/remote disks with ReFS/XFS formatted volumes is a very good and solid configuration. … if not the better one. You have a quite good deduplication from Veeam and a good restore/instant recovery performance. Yes, deduplication appliances have better dedup-rates, but mainly if you do just full backups. 


regnor
Forum|alt.badge.img+14
  • Veeam MVP
  • 1352 comments
  • November 4, 2020

It depends on the customer’s requirements (RPO/RTO) but I personally would never use a dedupe appliance as the primary repository. Restore performance can/will decrease over time and the restore time will get unpredictable; doing a DR wouldn’t be much fun. Thinking about the costs of a dedupe appliance, there should be enough budget for a smaller and faster primary backup storage.

On the other hand if you have a landing zone or something similar thing could be different. There has been a discussion recently in the forums about this topic: https://forums.veeam.com/veeam-backup-replication-f2/primary-backup-on-dedup-without-landing-zone-really-t68501.html


MicoolPaul
Forum|alt.badge.img+23
  • 2360 comments
  • November 4, 2020

I would say no, and my reason in one word would be: SureBackup.

Unless your deduplicating appliance has a “landing zone”, then reading data requires the deduplication appliance to rehydrate the data which has a massive impact to performance in any recovery operation, far more performance impact than reading directly from a traditional DAS with Veeam’s normal compression & deduplication settings, then don’t forget as well when we get noisy VMs like when they were backed up pending a reboot for Windows Updates and start trying to finalise a patch installation…

Veeam themselves recommend deduplication appliances mainly as secondary devices though they do say they still support it:

https://bp.veeam.com/vbr/VBP/3_Build_structures/B_Veeam_Components/B_backup_repositories/deduplication.html


Davoud Teimouri
Forum|alt.badge.img

I always get best result from Veeam and there was no need to appliance or using storage hardware deduplication feature. I agree with MicoolPaul, you can use appliance to keep backups as long time retention.

If you want to keep backup files on appliance or proxy server with OS deduplication feature, you need to change default/optimal compression and deduplication settings for repositories or backup job.


TimK
Forum|alt.badge.img+3
  • Not a newbie anymore
  • 7 comments
  • November 6, 2020

Yes for backup copys but not as primary Repo. Mostly because you loose ability for instant recovery and more features based on that.


Gurdip Sohal

I think it depends on the use case.  If its purely as a SOBR backup repository then you can see the appeal.  If you designing and consider complete backup and recovery, then its a totally different consideration.  The cornerstone of Veeam success is the ability to recover quickly and this could be the difference as to whether a de-dup appliance is the right fit.  Another factor to consider is the overall data volume in question, for larger environments, the greater dedup ratios will be potentially attractive.


Forum|alt.badge.img
  • New Here
  • 2 comments
  • December 8, 2020

We have clients that use de-dupe appliance as a primary repository and other that use disk formatted ReFS and I must say that I do not see any difference in recovery performance. There are restrictions using either scenarios..in that the storage that is used for the ReFS should be outside the storage where the originating data is, so that the data is protected.  In both cases you have to have storage outside your data storage solution. So .. when your customer has lots of data then the difference in cost become less, as the processing needed to manage the de-duplication filesystem grows. The de-duplication device then has the advantage that it is singular in purpose. They do de-duplication better and therefor become more and more cost effective as the amount of data gets bigger. In large organizations I do not see that you will get away with a Windows with ReFS repository


haslund
Forum|alt.badge.img+14
  • Mr. VMCE
  • 391 comments
  • December 8, 2020
Thinux wrote:

We have clients that use de-dupe appliance as a primary repository and other that use disk formatted ReFS and I must say that I do not see any difference in recovery performance. There are restrictions using either scenarios..in that the storage that is used for the ReFS should be outside the storage where the originating data is, so that the data is protected.  In both cases you have to have storage outside your data storage solution. So .. when your customer has lots of data then the difference in cost become less, as the processing needed to manage the de-duplication filesystem grows. The de-duplication device then has the advantage that it is singular in purpose. They do de-duplication better and therefor become more and more cost effective as the amount of data gets bigger. In large organizations I do not see that you will get away with a Windows with ReFS repository

Very interesting read, thank you for posting your insights. You should never use your production storage for storing your backups, so I don’t really agree that is a restriction for ReFS (or XFS or any other storage system).

When the customers have a lot of data, that does not necessarily also mean they have a lot of duplicated data? For example, how much data would really be duplicated across your Microsoft Exchange Server and Oracle Database server? Some operation system blocks perhaps?

Personally, I think deduplication appliances make the most sense when you are going down a path of having GFS enabled for long term storage of identical full backups, if you are having just one full and incremental then what would the real benefit be?


BertrandFR
Forum|alt.badge.img+8
  • Influencer
  • 528 comments
  • December 8, 2020

I aggree with @haslund , i think it depends of the scenario but from point of view. I did it recently the average cost of TB is cheaper with standard server. Even if you take SSD for better performances… I prefer to have storage agnostic like my software to be able to industrialize to a large scale and scope.

It’s faster to power on and provision a new server automatically than add a storage array with a dedup appliance etc...


Link State
Forum|alt.badge.img+11
  • Veeam Legend
  • 605 comments
  • December 10, 2020

Deduplication appliance is all nice as long as you write the backups and the deduplication ratio allows you to store a large amount of backups. Unfortunately when you need to perform a quick restore from a deduplication appliance it's a pain. imho Dedup appl only for long retention is good.

edit:

p.s. Obviously it depends on the scenario and the RTOs and RPOs


  • New Here
  • 2 comments
  • December 16, 2020

 

MicoolPaul wrote:

I would say no, and my reason in one word would be: SureBackup.Unless your deduplicating appliance has a “landing zone”, then reading data requires the deduplication appliance to rehydrate the data which has a massive impact to performance in any recovery operation, far more performance impact than reading directly from a traditional DAS with Veeam’s normal compression & deduplication

Landing zones aren’t necessarily faster than reading direct from dedupe.

Quantum's DXi meets the restore performance requirements, and also supports Fast Clone for faster synthetic full creation.

Read probably won’t be as fast as a similarly sized primary storage partition, (YMMV), but is likely to be as fast or faster than a “landing zone” tier, with the added advantages of Fast Clone support and no performance drop-off when the backup in question has left the landing zone, and all the space available for dedupe.


MicoolPaul
Forum|alt.badge.img+23
  • 2360 comments
  • December 16, 2020
softflame wrote:

 

MicoolPaul wrote:

I would say no, and my reason in one word would be: SureBackup.Unless your deduplicating appliance has a “landing zone”, then reading data requires the deduplication appliance to rehydrate the data which has a massive impact to performance in any recovery operation, far more performance impact than reading directly from a traditional DAS with Veeam’s normal compression & deduplication

Landing zones aren’t necessarily faster than reading direct from dedupe.

Quantum's DXi meets the restore performance requirements, and also supports Fast Clone for faster synthetic full creation.

Read probably won’t be as fast as a similarly sized primary storage partition, (YMMV), but is likely to be as fast or faster than a “landing zone” tier, with the added advantages of Fast Clone support and no performance drop-off when the backup in question has left the landing zone, and all the space available for dedupe.

You make a good point, having not used Quantum I can’t comment on the performance of it specifically but it like everything depends on customer budget and vendor preference as there’s huge variety of features between deduce appliances that my recommendation wouldn’t be to default to a dedupe appliance as my primary storage. Are you using Quantum? How do you find it if so?


vNote42
Forum|alt.badge.img+13
  • On the path to Greatness
  • 1246 comments
  • December 16, 2020
MicoolPaul wrote:
softflame wrote:

 

MicoolPaul wrote:

I would say no, and my reason in one word would be: SureBackup.Unless your deduplicating appliance has a “landing zone”, then reading data requires the deduplication appliance to rehydrate the data which has a massive impact to performance in any recovery operation, far more performance impact than reading directly from a traditional DAS with Veeam’s normal compression & deduplication

Landing zones aren’t necessarily faster than reading direct from dedupe.

Quantum's DXi meets the restore performance requirements, and also supports Fast Clone for faster synthetic full creation.

Read probably won’t be as fast as a similarly sized primary storage partition, (YMMV), but is likely to be as fast or faster than a “landing zone” tier, with the added advantages of Fast Clone support and no performance drop-off when the backup in question has left the landing zone, and all the space available for dedupe.

You make a good point, having not used Quantum I can’t comment on the performance of it specifically but it like everything depends on customer budget and vendor preference as there’s huge variety of features between deduce appliances that my recommendation wouldn’t be to default to a dedupe appliance as my primary storage. Are you using Quantum? How do you find it if so?

I agree, it depends :grin:

In my experience, dedup-appliances are considerable more expensive then general propose hardware. And there are not that much more benefits. But also this depends. Could absolutely be, there are appliances out there with comparable price and more benefits. Is Qunatum one of these?


BertrandFR
Forum|alt.badge.img+8
  • Influencer
  • 528 comments
  • December 16, 2020

I have quantum DXi and libaryry I6000, i3.

I can be critical, i think. It depend of your budgets, your knowledge, your team.

Like @vNote42 said it’s more more expensive…

For my POV, i prefer to have the total contral of my cost and delivery. Now i have to chose, i will go to server with large storage capacity.

If you don’t have people or knowledge Dedup appliance could be a great choice but you should have enough budget.


Comment