Veeam Backup and Backup Copy Jobs - Compression and Deduplication


Userlevel 5
Badge

We’ve decided to repurpose one of our large Synology boxes (RS3617xs+) and put some big drives to get us adequate storage space. On the bottom layer, the Synology FS will be BTRFS. We will create an iSCSI LUN on said FS, XYZ TB in size, mounted to a Linux VM formatted XFS for both immutability and fast block cloning for synthetic fulls.

My question here how do I make sure I’m getting THE BEST compression and deduplication ratios out there from the Veeam side. I assume also “per-vm” repos fit in here as well? If not doing “per-vm” repos, more blocks will be deduped in this scenario than in a “per-vm” repo. This is so we get the best “usage” from our storage and lasts us for some time.

Then there’s this article: Data Compression and Deduplication - User Guide for VMware vSphere (veeam.com)

Another question, does data compression happen after deduplication?

Data Compression

For backup jobs - I assume the default “Optimal” would fit most scenarios. The more compression, the more resources are needed for the job and the longer restores would take.

For backup copy - The article recommends “Auto” which again I assume would fit most scenarios.

Deduplication

The article explains to have Veeam Data Movers on both source and target sides. Assumption, the data movers are the Veeam proxies? We only have two proxies in our local datacenter and two more, one each in our remote datacenters. Will this setup suffice? One of the remote sites, we’re transferring data over wireless links which essentially would be like moving it over WAN.

Since the device we’re storing data to is a NAS (not a SAN or DAS or local storage) even though we’re presenting the iSCSI LUN as a disk to the Linux VM, we would choose 512KB block over 1MB (default?) therefore giving us smaller output files.

Let me know what you think about my above statement. Also if there’s anything you would do differently or recommend in addition to above.

Thanks as always.


4 comments

Userlevel 7
Badge +20

If you are going to be using iSCSI with XFS and immutability then leaving the default options for block size, etc is best.  Having a deduplication appliance gives you the best results but you work with what you have. You can always test each setting to see if you see any difference as well.

Userlevel 7
Badge +17

Hi @jaceg23 -

First. for running XFS on Linux, make sure to format the filesystem properly (4K Block Size), in addition to other filesystem settings (reflink, crc, etc). Reference the Veeam BP Guide and User Guide for XFS configuration info:​​https://bp.veeam.com/vbr/3_Build_structures/B_Veeam_Components/B_backup_repositories/block.html#xfs-considerations

https://helpcenter.veeam.com/docs/backup/vsphere/backup_repository_block_cloning.html?zoom_highlight=xfs&ver=120#fast-clone-for-linux-repositories

Before discussing your Compression/Dedup question, yes...you should indeed use Per-VM on your Repos for several reasons, including the one you mentioned.

Now, for Compression and Dedup, again..yes, the optimal settings are designed by Veeam to provide a good balance between backup size and backup and restore (a lot of folks miss the ‘restore’ part) performance and resource usage during the backup process. About the only time you wouldn’t use Dedup is if your Repo was a dedup appliance. Additionally, you would also select to disable Veeam compression at the repository level by using the Decompress backup data blocks before storing advanced option in repository configuration...again, only if you were using a dedup appliance. You’re not, so you’re good to go there. The Veeam BP Guide discusses when you would consider changing the Compression/Dedup defaults:
https://bp.veeam.com/vbr/4_Operations/O_Veeam_Jobs/O_backup_jobs/backup_job_storage.html#compression

https://bp.veeam.com/vbr/3_Build_structures/B_Veeam_Components/B_backup_repositories/deduplication.html#job-configuration

With the release of Veeam v12, Veeam has further optimized Compression. How much? Well, @MicoolPaul did a very fine post a few months back where he tested out Veeam’s different Compression levels. You can check it out below:

As far as which comes first...Compression or Dedup, the answer is “yes” 😂 More specifically though, it’s hard to say exactly. Dedup happens on both the source Data Mover (VM disk level) as well as target (backup file level), as stated in the link you provided. Compression also happens on the source Proxy (Data Mover) since it happens before data traverses the network. Whether Dedup happens before Compression on the source side I wasn’t able to find, but my guess is that is the case. You could always ping Veeam Support, or ask the PMs on the Forums if you want to know for sure.

And yes, Data Movers are generally Veeam Backup Proxies, but Data Movers can be hosted on other Components, like a Gateway server, or Repository. It just depends on the environment. For you though..yes, your Proxies will host and be the Data Movers. Also, yes...since you are using a NAS device, Veeam’s Storage Optimization recommendation is to use 512K Block Size as also shown in the link you provided.

So overall, I think you’re spot on. Just be aware of some the configs you need to set for XFS, etc as I shared. As far as having “the best” Compression/Dedup, the only way to be ultimately sure on that is to do a test run of a group of your VMs with each setting. It would of course be a bit time consuming, but ultimately the best way to determine what works best for your environment.

Hope this helps.

Userlevel 7
Badge +20

Hi @jaceg23 

 

Just some comments to add around this:

 

We will create an iSCSI LUN on said FS, XYZ TB in size, mounted to a Linux VM formatted XFS for both immutability and fast block cloning for synthetic fulls.

Are you utilising an independent hypervisor to run this Linux VM? I’d be concerned that in the event of a cyber incident, your VM could be backdoored via the console and/or deleted. It’s trivial to reboot the VM into single user mode access and delete the backups. I only mention this because I’d hate for you to have a false sense of security. This is why a lot of people choose to make the server attached to immutable storage as a physical device.

 

 If not doing “per-vm” repos, more blocks will be deduped in this scenario than in a “per-vm” repo. This is so we get the best “usage” from our storage and lasts us for some time.

Correct that not using “per-vm” backup chains will increase the deduplication of data, have a read of this link. You’ll see that Veeam only deduplicates within a single file, so naturally more servers within a single backup file means better deduplication. There is an important trade-off however, backup performance for starters, and backup job scale. You’ll be working with a single metadata file and you’ll be performing data transformations across the entire file. This is also per backup job, so if you’ve got many backup jobs with a few VMs per job, you won’t notice much difference between “per-VM” enabled or disabled. If you really want the best deduplication, you’d want to use a deduplication appliance so that background deduplication tasks can be performed and deduplication can occur between backup files. How many virtual machines are you going to be backing up per backup job?

 

A further point on this is that your main space savings for ‘deduplication’ actually aren’t deduplication, but data efficiency via block reuse, using XFS file system with synthetic fulls will allow for unchanged blocks to be reused between backup jobs, effectively deduplicated data between files, albeit only the full backup files.

 

For backup jobs - I assume the default “Optimal” would fit most scenarios. The more compression, the more resources are needed for the job and the longer restores would take.

This is true, but Veeam have done huge amounts of work on this, in v12, as Shane has shared my blog post on the subject.

Another question, does data compression happen after deduplication?

No. The compression is performed by the proxy, whilst the deduplication is handled by the repository. The block size you specify is the chunks that the VM is split into and read, with the default being 1MB/Local. Your block is compressed, lets assume 50% data reduction with this, making a 512KB block, and this is then saved to the backup, which will look at whether the block can be deduplicated. Why the compression? Because we don’t know this yet and by compressing the data we can shift more over the network which is a common bottleneck.

 

But this is not to say any data efficiency tasks don’t occur before compression, utilising CBT to only fetch changed blocks, and skipping swap files, deleted file whitespace etc is ‘kind of’ deduplication in that deleted files should be whitespace that we just zero out etc. But I find it better to call this ‘data efficiency’

Since the device we’re storing data to is a NAS (not a SAN or DAS or local storage) even though we’re presenting the iSCSI LUN as a disk to the Linux VM, we would choose 512KB block over 1MB (default?) therefore giving us smaller output files.

If all you really care about is the smallest possible backup file, go one step further and use 256KB/WAN blocks, but your performance will suffer the smaller you make the blocks VS how many blocks you’ve got to process and the resources available for your Veeam backup repository. Veeam architect guidance is also to avoid using synthetic fulls when leveraging 512KB/256KB block sizes due to the extra pressure you’re placing on the system, so you’d be potentially losing XFS’s FastClone space savings which will dramatically outweight the efficiency gains of a reduced block size.

 

If you wanted to copy any of this to object storage, then especially if you’re on metered API calls, you’ll be generating an absolute ton of noise by uploading so many small blocks. I would really suggest sticking to 1MB block sizes on this one. If you want to push lower, test heavily that it is suitable for your platform. I know you mentioned this coming from the fact you’re planning on using a NAS, but you’re not using SMB/NFS protocols to access, and instead using iSCSI, so I would be treating this like a SAN hence 1MB making sense.

 


Hopefully the above helps, if anything above has generated further questions then please don’t hesitate to ask :)

Also if there’s anything you would do differently or recommend in addition to above.

I’m gonna talk freely here on what I would do in your position. I’d look at some physical storage + compute if block storage, or a physical object storage device to ensure immutability. There are plenty of object storage vendors such as Object First’s OOTBI, Scality Artesca etc. If using block storage I would go for a lower end server with some direct attached storage, or even an external SAS card to additional storage if I had such high storage density requirements. For block storage, I’d 100% be looking at Linux as XFS is typically less hungry for resources on the system based on my experience + I could use immutability.

For a primary backup, I’d be looking at optimal backup compression so I didn’t negatively impact my restores too much, for backup copy if necessary I’d look to compress further. I would keep my backups as per-VM chains because I believe it’s only something like a 10% difference. If I needed maximum density of data and the data set was large enough to justify this, I’d backup copy to either a cheaper object storage via an OpEx cost model, or via the CapEx model I’d probably look at a deduplication appliance.

For block size, I would stick with 1MB unless the data being protected was large enough that 4MB was justified.

No ideas on budget etc so appreciate some of this might not align with budgets.

Userlevel 5
Badge

All good stuff folks, thank you SO MUCH! I will be going thru all the comments with a fine-tooth comb and jot down all the awesomeness shared! I will also reply to comments left by you guys as well. I’m getting more and more confident that our Veeam setup will be at peak stature when we implement new improvements not only to speed up backup times, but recovery times as well. And with immutability and Veeam 12.1 in the picture now, our security posture will improve immensely. Thanks again to all who’ve commented, you’re appreciated more than I can express here in this feed.  Have a great one!

Comment