Deduplication Appliances Best Practices


Userlevel 7
Badge +6

 

Deduplicating storage appliances plays a significant role in organizations' data protection strategy by reducing consumed storage, eliminating similar data blocks, and providing secure long-term data storage using mechanisms such as immutability.

However, when developing the architecture and solution for any project, it is important to understand the business requirements for correctly implementing and using these solutions.

Depending on the deduplication appliance, it all comes down to understanding and choosing an appropriate trade-off between space savings, restoring speed, and cost optimization so that the implementation provides the expected result.

It is important to highlight that Veeam VBR allows customers freedom of choice, is storage-agnostic, and allows deduplication appliances as a backup repository. Briefly, the following are the Veeam VBR integration possibilities:

Supported Non-deduplication Storage Systems

  • Direct attached storage:
    • Microsoft Windows server (ReFS):  Fast Clone, no immutability.
    • Linux server (XFS): Fast Clone (reflink), with immutability.
  • Network-attached storage:
  • SMB (CIFS) share: Fast Clone, no immutability.
  • NFS share: no Fast Clone, no immutability.
  • Object storage:
  • Direct to Object: immutability
  • Capacity/Archive Tier of a SOBR: immutability

Supported deduplicating storage appliances:

  • Dell Data Domain: DD Boost, with immutability/retention lock from v12.1 (build 12.1.0.2131).
  • ExaGrid: data mover, data protection from tiered air-gaped.
  • HPE StoreOnce: catalyst, with immutability.
  • Quantum Dxi: immutability from secure snapshots.

Veeam describes architecture and integration recommendations for deduplication applications in detail in three documents:

  1. Best Practices Guide:

https://bp.veeam.com/vbr/2_Design_Structures/D_Veeam_Components/D_backup_repositories/deduplication.html

  1. Architectural guidelines for deduplication systems:

https://www.veeam.com/kb2660

  1. Deduplication Appliance Best Practices:

https://www.veeam.com/kb1745

The basic questions that must be answered to implement an adequate architecture using deduplication appliances are:

  • What are the RPO/RTO requirements?
  • What is the demand for high-priority and low-priority restores?
  • Will it be necessary to implement backup encryption?
  • How will the 3-2-1 architecture be implemented?

Deduplicating storage appliances characteristics:

In general, deduplication storage systems were initially created to replace tapes. These systems are often optimized for sequential write operations and can offer high ingestion rates.

However, regarding random read performance for deduplication appliances, higher latencies may occur due to the rehydration processes required during restores. This characteristic is bypassed in devices with a fast, non-deduplicated area for the most recent restore points, such as the Exagrid solution.

But, in general, this penalty should be considered for recovery operations requiring speed.

Veeam General Recommendation:

From KB2660:

"When using Veeam with a deduplicating storage system, a best practice can be to have a non-duplicating storage system as the primary backup target for the most recent restore points and then use backup copy jobs to store long-term retention on a dedupe storage system.

It is important to note that while this is Veeam's general recommendation, there is a wide array of different hardware deduplication options, some of which have Veeam-specific features enabled or are built with solid-state drives to improve random read performance. Because of this, Veeam encourages an in-depth recovery time and recovery point requirements discussion with a value-added reseller or the hardware manufacturer to determine how best to leverage deduplicating storage".

This recommendation is adherent with the "What is the ultimate VM backup architecture?" article from Rick Vanover and the "Antony Gostev's Ultimate VM Backup Architecture" as presented in the following link:

https://www.veeam.com/blog/what-is-the-ultimate-vm-backup-architecture.html

 

Antony Gostev's Ultimate VM Backup Architecture

 

Using a primary backup repository close to the production environment, with greater performance and executing only short-term retention, speeds up backup jobs and recovery procedures – lower RTO.

Allocating a lower-performing secondary backup repository for long-term retention can save space, optimize cost, and meet the requirements of a 3-2-1 architecture. This secondary repository can be a deduplicating storage appliance, which will be able to meet these two business requirements very well.

There is general configuration advice in KB1745. As shown below, some of them should be considered when backing up directly to deduplicated storage.

 

Let's highlight some of the primary backup job-level recommended configurations:

Backup Job Settings:

  • Backup Mode: Forward Incremental
  • Active Full Backup (*): enabled and set to weekly. Weekly Full restore points will ensure that during a restore, as few restore points must be read from as possible.

(*) Some Dedupe Appliances that integrate with Veeam Backup & Replication can perform Synthetic Full operations.

Storage tab:

  • Inline data deduplication: disabled.
  • Compression level: optimal or dedupe-friendly.
  • Block Size: 4MB.
  • Encryption: disabled.

Conclusion:

As discussed previously, there are various integration possibilities with deduplication appliances. Choosing the best architecture and configuration related to the organization's business expectations is necessary. The capacity for rapid operational recovery is a latent demand and must be very well studied when using deduplication appliances.

Additional References:

·  Dell Data Domain > Limitations and Recommendations

·  ExaGrid > Requirements and Recommendations

·  Fujitsu ETERNUS CS800 > Backup Job Configuration

·  HPE StoreOnce > Limitations and Recommendations

·  Infinidat InfiniGuard > Backup Job Configuration

·  Quantum DXi > Backup Job Configuration


4 comments

Userlevel 7
Badge +6

Good info here.  Folks with deduplicating appliances such as DataDomain tend to be well aware of not using the arrays as their primary backup storage, but it seems that there’s still a fair amount that do.  Some of them seem pretty surprised still with performance suffers.  I appreciate the callout for Exagrid and their tiered model as well for performance and then archival with deduplication.  I believe I had a conversation with them last year at VeeamON about this capability.  Might be overkill for some folks though, especially my smaller clients.

Userlevel 7
Badge +21

Great information Luiz.  These appliances need special attention.

Userlevel 7
Badge +19

Good writeup Luiz!

I think my main issue with never even looking at a dedup appliances for my backup environment is rehydration cost during restore. I want my restores to be as quick as possible and dedup appliances, generally speaking, aren’t great at that. Although, I think DataDomain does ok with DD Boost (or whatever tech it’s called they have for that 😁)?

Userlevel 7
Badge +6

Thanks, @dloseke, @Chris.Childerhose, and @coolsport00! It is an important subject do discuss and analyze in any project involving deduplicating appliances! Veeam provides a lot of information on this topic and it helps! 😀

Comment