Skip to main content

Hi guys,

Which one is better for ensuring about our backup files health:

  • Validating backup files by run Backup Validator as a script after backup job is done.
  • Schedule Storage-Level Corruption Guard

 

BR

Davoud

I would prefer the validator tool from my backup software vendor. It should be able to test more specific...

And in case of Veeam consider Sure Backup. With this you can be sure that the backup are restoreable.


I think from a technical point both ways and their results don’t differ much. The difference is that the storage-level corruption guard only verifies the last restorepoint and not the complete chain. And it can, in theory, repair a damaged backup file, unlike the backup validator.

I personally would go with the Storage-Level Corruption Guard as it’s integerated inside the backup job and you don’t need to care about your custom script, reporting, etc. In addition, just like @JMeixner said, I would regularly run SureBackup jobs, because with those you can verifiy that your VMs and applications are operational.


😎 Got distracted while writing…

Yes, @regnor is right. Both tools have their strengths. The storage level corruption guard is great to run with the backup jobs automatically. And the validator tool is great for checking your complete backup chains. So, a combination of these two with sure backup. Backups tested in this way should be restoreable most times...


Thank you @JMeixner and @regnor 

Actually, using Backup Validator takes long time to complete and as you mentioned, storage level corruption guard has some limitations.

What's your recommendation for backup jobs with three restore points and incremental method?

More info:

  • Half of jobs are scheduled at even days of the week and the rest are scheduled at odd days of the week.
  • Each job has a post-script for run Backup validator tool for the job.
  • Maintenance scheduled at Friday for all jobs with old object delete option.
  • Two datastore objects have been added to each backup job.

The question is, what do you need to achieve or what are your requirements?

And perhaps some more details could be useful. Are the jobs forever incremental or doing scheduled active/synthetic fulls? Which Filesystem does your repository use? Also, why did you schedule and seperate the jobs like this?


All,

A few notes on Validator versus Health Check

  1. Functionally, both work about the same in terms of what they do
  2. Health Check is a lot less manual
  3. As of v11a, Health Check has a substantial speed boost over Validator as Health Check received an async engine in v11a (5 streams instead of 1) so it should almost always be most performant (I’ve seen cases where new Health Check was so performant it brought pretty beastly storage systems to their knees while the Health Check ran)

In general, the main difference is what you’re trying to actually prove. Running both “once” is not enough, because the window of time you’re checking for the stability of the backup is quite short, and the only conclusion you can derive safely from this is:

“At the time I ran the validation, it did not detect corruption”

Validator has the flexibility of checking specific points at will, while Health Check has the automation and speed advantages. I would put Vaildator into the category of on-demand checks, while Health Check is good for ensuring you can restore from your most recent points reliably and that data-blocks in the backup chain are safe/valid for recovering from a given restore point.

Both are meant to catch bitrot/storage level corruption for your disk-based storage, but long-term you need to somehow move your data to a better long-term storage solution like Object Storage or Tape. Tape has the advantage now of Tape Verification so you can periodically check your archival backups, while Object Storage (from an appropriate provider) has the guarantee of storage stability. Be very careful with smaller S3 systems bundled with NAS devices; you do not get the same level of resilience on these devices as it costs too much space to ensure data integrity via multiple copies.

 

Also consider XFS/ReFS and their scrubbing/journaling systems for catching storage corruption; with mirror parity, ReFS can even potentially “heal” such storages, and XFS has the same chance almost out of the box with xfs_repair.

 

There isn’t a good catch-all here, it’s extremely dependent on what you specifically want to accomplish and report on. The most important thing is to understand that these validations are only valid for that particular time; it’s entirely plausible and real that you might end up with dead sectors or bitflips/bitrot as soon as the validation finishes.

 

The below is not official Veeam advice, just my take, but put your short-term backups on a proper filesystem like XFS (or ReFS if you want to go the Windows route) and rely on their integrity checks for catching bitrot/repairs. Enable Health Check for your most critical machines to avoid overloading these storages. Tape-out or Offload your backups to a long-term storage as soon as you are feasibly able to; with Tape, use periodic verification of the tapes as they cycle in/out and include a retrieval from your vaults to validate tapes and ship them back out again. For S3, don’t rely on S3 applications bundled with NAS devices for longevity.


@ddomask Perfect post, thank you for sharing! it’s really interesting.


Comment