Solved

Can Veeam catch memory errors during backup?


Userlevel 5
Badge +1

Hello, Thanks,

My question is about Veeam catching memory errors during the backup itself.
Sure, sometimes ECC can catch that.

For example, Let’s say Veeam:

  1. Reads a sector from disk into memory.
  2. Whilst that sector is in memory, the memory is corrupted.
  3. Veeam writes that corrupted sector to the backup file.

A Health check would not catch that.

Does Veeam have a way to catch that, by re-reading the source sector again or some other mechanism?

Thanks much,
David


 

 

 

icon

Best answer by MicoolPaul 8 November 2023, 17:33

View original

11 comments

Userlevel 7
Badge +20

From everything I know there is no facility in Veeam to account for memory errors.

Userlevel 4
Badge +2

This might be of use….

 

https://forums.veeam.com/vmware-vsphere-f24/checking-backup-for-corruptness-t34961.html

 

“The most reliable way to avoid silent data corruption is to run SureBackup jobs with the entire virtual disk content check enabled.

This way not only will you make sure that the backup file hasn't been changed anyhow since the time it was created, but also you'll confirm that VMs themselves are bootable. “

Userlevel 7
Badge +20

This might be of use….

 

https://forums.veeam.com/vmware-vsphere-f24/checking-backup-for-corruptness-t34961.html

 

“The most reliable way to avoid silent data corruption is to run SureBackup jobs with the entire virtual disk content check enabled.

This way not only will you make sure that the backup file hasn't been changed anyhow since the time it was created, but also you'll confirm that VMs themselves are bootable. “

This would work after the job is done to run a backup check for sure.  If wanting it inline when the job is running, then I don’t think it is possible.

Userlevel 7
Badge +20

Hi,

 

That would be a question best asked to the R&D team over on forums.veeam.com. I’m not aware of anything, and it’s a question of ‘how far do you go’. Do you read every block twice and compare the values? That doubles the IO required for a backup. You could also go down the rabbit hole of how many times should you validate to get to the level of 9’s resiliency against data corruption. The best trade-off is utilising ECC and validating backup recoverability IMO.

Userlevel 5
Badge +1

@MicoolPaul  - “Do you read every block twice and compare the values? That doubles the IO required for a backup”
IMHO, with backups, not too much to ask for. Thanks for the answer.

@ratkinsonuk  - Perhaps the name should be `SureBackup’ish`;wink

 

 

Userlevel 5
Badge +1

Sorry, I just reailzed that i mis-posted this topic in the ‘Script Library’.

Please feel free to move it.

Userlevel 4
Badge +2

 

@ratkinsonuk  - Perhaps the name should be `SureBackup’ish`;wink

 

I thought it’s worth mentioning in case you’re not fully aware of SureBackup that it’s a separate component of Veeam B&R to the actual backup functionality. It’s most powerful feature is being able to create logically separated test environments using data from backups with the click of a button.

I’ll take the £20 now if anyone from Veeam Sales is listening :)

Rob.

Userlevel 5
Badge +1

@ratkinsonukI thought it’s worth mentioning
Yes, it was worth mentioning, Thanks.

 

Userlevel 7
Badge +8

I have not seen this, but running sure backup is a great way to test your backups.  If you are having ECC Memory errors you server should be reporting it and turning alerting on at the server level is recommended.

My server has given me ECC memory error during a backup before but recovered the block and the backups were fine. 

 

Userlevel 5
Badge +1

@ScottMy server has given me ECC memory error during a backup before but recovered the block and the backups were fine.

That is very interesting. Is that windows, linux or what?
How do you know that the server had an memory issue, that the corresponding backup had an issue and it was recovered?

 


My current home server is the awesome free window server hyper-v 2019 edition. but running on a older desktop computer. Tho I just updated it to evaluation edition of windows server 2022.

I am been looking to build a cheap’ish motherboard+cpu+ECC but get a bit overwhelmed.
I am often confused about ECC and its various types, cpu and motherboard support.
The more I read about, the more confused I get…

 

 


 

Userlevel 7
Badge +8

@ScottMy server has given me ECC memory error during a backup before but recovered the block and the backups were fine.

That is very interesting. Is that windows, linux or what?
How do you know that the server had an memory issue, that the corresponding backup had an issue and it was recovered?

 

My current home server is the awesome free window server hyper-v 2019 edition. but running on a older desktop computer. Tho I just updated it to evaluation edition of windows server 2022.

I am been looking to build a cheap’ish motherboard+cpu+ECC but get a bit overwhelmed.
I am often confused about ECC and its various types, cpu and motherboard support.
The more I read about, the more confused I get…

 

It’s a physical Cisco UCS server.. The server itself has a managment GUI and alerting to SMTP where it sends emails. It sent an email about the ECC error on that memory bank. 

Windows and Veeam just kept on working as expected.
 

 

It’s a physical Cisco UCS server.. The server itself has a managment GUI and alerting to SMTP where it sends emails. It sent an email about the ECC error on that memory bank. 

Windows and Veeam just kept on working as expected.

 

Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code (ECC) to detect and correct n-bit data corruption which occurs in memory.

 

While it’s not always 100% depending on how bad, or what specifically went corrupt,  I have had a few over the years with no issues. If you DO happen to notice them occuring, replace the memory stick and life continues. 

Comment