Veeam: inline deduplication


Userlevel 7
Badge +13

While I was studying for the VMCE, a detail on deduplication caught my attention.

I’m talking about inline deduplication.

 

What's about?

Inline deduplication is a feature offered by Veeam B&R and it’s used to reduce the amount of storage space required to store backup data by removing duplicate data. One of the main benefits of inline deduplication is in fact that it can significantly decrease the amount of storage space, amount of time and resources required for backups and this can be especially beneficial for organizations that generate a large amount of data on a regular basis. 

 

 

Another advantage of inline deduplication is that it can improve the efficiency of backups, ‘cause backups can be completed faster and the amount of data that needs to be transferred and stored is reduced. 

 

An example?

An immaginary company named CatsFood has a file server that contains 10 terabytes of data and the organization wants to create a backup of this file server using, proudly, Veeam. Without inline deduplication the backup would require 10 terabytes of storage space. However, with inline deduplication enabled Veeam would identify and remove duplicate data before creating the backup: in fact if file server contaisn multiple copies of the same file or multiple versions of the same file that have been modified slightly, Veeam's inline deduplication feature would identify these duplicate files and “remove them”, reducing the amount of data that needs to be stored in the backup. Let's say that the inline deduplication feature was able to remove 2 terabytes of duplicate data. This would mean that the backup would only require 8 terabytes of storage space, as opposed to 10 terabytes without deduplication.

 

This is a significant reduction in storage requirements and can help organizations save on storage costs. It's worth noting that Veeam's inline deduplication feature can be configured with different settings to optimize the performance and storage space usage. For instance, you can configure the level of deduplication, the block size and the compression level.

 

And what if repository’s a dedup storage?

Best practices says disable inline deduplication setting when writing into deduplication storages.

 

Find more here:

Deduplication Appliance Best Practices: https://www.veeam.com/kb1745

Help center: https://helpcenter.veeam.com/docs/backup/hyperv/compression_deduplication.html?ver=110


10 comments

Userlevel 7
Badge +7

Great thank you @marcofabbri This will come in really handy for me when studying. 

Userlevel 7
Badge +8

Hey @marcofabbri , great article. I will add the behavior of inline deduplication will be different if you’re using per vm backup files or per jobs. Obviously better results on per jobs but less speed!

Userlevel 7
Badge +17

Yes, inline deduplication is a great feature.

Please be aware that deduplicating storage is not recommended for primary repositories. Your restores will be very slow with this configuration.

Better use a “normal” block storage (or object storage with V12) and Veeam's inline deduplication. And use a deduplicating storage for backup copy repositories only.

Userlevel 7
Badge +10

Thanks @marcofabbri . I’ve 2 little question to clarify my ideas….
     1. with compression the backup size is reduced to 50% of source… from helpcenter:

(Veeam Backup & Replication assumes that the following amount of space is required for backup files:

  • The size of the first full backup file is equal to 50% of source VM data)
  1. for better performance every backup must be written in different reporsitory depending on VM size using local or LAN target if repository type is SAN/DAS or NAS. 

Correct? 

I’m studiyng for VMCA exam and I don't want to get confused! :P 

Thanks 

Userlevel 7
Badge +20

Yes, inline deduplication is a great feature.

Please be aware that deduplicating storage is not recommended for primary repositories. Your restores will be very slow with this configuration.

Better use a “normal” block storage (or object storage with V12) and Veeam's inline deduplication. And use a deduplicating storage for backup copy repositories only.

Totally agree with this for sure.

Great post Marco.

Userlevel 7
Badge +13

Better use a “normal” block storage (or object storage with V12) and Veeam's inline deduplication. And use a deduplicating storage for backup copy repositories only.

Totally agree with you!

Userlevel 7
Badge +13
  • The size of the first full backup file is equal to 50% of source VM data)
  1. for better performance every backup must be written in different reporsitory depending on VM size using local or LAN target if repository type is SAN/DAS or NAS. 

Correct? 

 

Ad help center says: 

  • The size of the first full backup file is equal to 50% of source VM data.
  • The size of further full backup files is equal to 100% of the previous full backup file size.

So yes for the first question.

For the second one, did you mean this? ‘cause there’s a great post about that: 

https://community.veeam.com/blogs-and-podcasts-57/object-storage-impact-of-job-storage-optimization-settings-in-v11-2601

 

 

Userlevel 7
Badge +10
  • The size of the first full backup file is equal to 50% of source VM data)
  1. for better performance every backup must be written in different reporsitory depending on VM size using local or LAN target if repository type is SAN/DAS or NAS. 

Correct? 

 

Ad help center says: 

  • The size of the first full backup file is equal to 50% of source VM data.
  • The size of further full backup files is equal to 100% of the previous full backup file size.

So yes for the first question.

Ok Marco this means your 10TB of VM’s size they become 5TB of backup size. 

For the second one, did you mean this? ‘cause there’s a great post about that: 

https://community.veeam.com/blogs-and-podcasts-57/object-storage-impact-of-job-storage-optimization-settings-in-v11-2601

 

 

Correct…. for second item this storage optimization permit us to save a little bit of space (also better throughput and processing time) on repository due to block size optimization. 

In addition, your focus on Inline deduplication, permit the achievement to save more space on repo.

Userlevel 7
Badge +6

I was looking for the deduplicating storage device caveat.  I would like to note that if Veeam’s deduplication is anything like most deduplication, and it proabably is but I haven’t tested it, deduplication (and compression) doesn’t work well for things like encrypted data, video and images.  But again, if the files exist multiple times on a server, it can make a big difference.

Userlevel 7
Badge +6

Hey @marcofabbri , great article. I will add the behavior of inline deduplication will be different if you’re using per vm backup files or per jobs. Obviously better results on per jobs but less speed!

 

It should (probably) be noted that per-VM is the default with V12 vs previous versions of VBR, and in all honesty, while per-job backups are more space efficient, they’re not as flexible, such as for cleaning up old VM data such as after a decommission as one example.  I like to use Per-VM when I can just because of the flexibility, although I’m not sure I’ll go forward with using VeeaMover to convert my per-job restore points to per-VM, although I’m sure I have some clients without enough space to perform that conversion anyway.

Comment