Question

CRC error during backup copy job


Userlevel 5
Badge
  • Comes here often
  • 32 comments

Hi,

 

we have 3 backup copy jobs for one backup job. All backup copy jobs for a specific vm fail with the following error:

 

9-8-2023 22:31:45 Error: Failed to decompress LZ4 block: Bad crc (original crc: 5a73d881, current crc: 3fa0f4bc). Failed to upload disk '65505BFB' Agent failed to process method {DataTransfer.SyncDisk}. Exception from server: Failed to decompress LZ4 block: Bad crc (original crc: 5a73d881, current crc: 3fa0f4bc). Unable to retrieve next block transmission command. Number of already processed blocks: [132]. Failed to download disk '65505BFB'.

 

The only way to solve this, was doing an active full on both the backup job and the copy jobs. I'm puzzled why this CRC error occurs. There is corruption on the source backup files (Backup validator also reports this). The backup files are located on a ReFS raid 6 volume. Integrity streams are enabled, so if there's corruption shouldn't refs detect that also? I can copy the file from windows explorer fine. If ReFS detects an integrity stream violation, it aborts the filecopy. The CRC error apparently occurred during the creation of the backup by Veeam.

I opened a case with support for this, but they state it’s a filesystem issue, which I highly doubt.

What are your thoughts?

 


12 comments

Userlevel 7
Badge +22

Without seeing all the information shared with support I’d like to think they’ve gathered more information to support that claim.

 

Theres a few things missing here such as what the underlying storage is that provides this RAID 6 and what protocol it’s accessed via? SAS/iSCSI etc.

 

I’d be inclined to start here

Userlevel 7
Badge +14

I would love to have a screenshot of the window displaying this error for education purposes, would it be possible to send it to Rasmus -at- Veeam.com, please? I’ll make sure to anonymize any identifying factors.

 

 

Userlevel 5
Badge

it’s a local repository on the Veeam server itself. Physical server DL380 Gen10 plus with 12x 8TB SAS drives.

Userlevel 7
Badge +14

I forgot to ask, you checked the system log already for any notifications about refs corruption?

Userlevel 7
Badge +14

Maybe also use PowerShell to confirm integrity stream really is enabled for that particular backup file? There is an example here: https://learn.microsoft.com/en-us/windows-server/storage/refs/integrity-streams

 

just search for Get-FileIntegrity

Userlevel 5
Badge

I forgot to ask, you checked the system log already for any notifications about refs corruption?

Yes, no corruption reported.

Userlevel 5
Badge

Maybe also use PowerShell to confirm integrity stream really is enabled for that particular backup file? There is an example here: https://learn.microsoft.com/en-us/windows-server/storage/refs/integrity-streams

 

just search for Get-FileIntegrity

Did that, integrity streams are enabled and enforced.

Userlevel 5
Badge

I would love to have a screenshot of the window displaying this error for education purposes, would it be possible to send it to Rasmus -at- Veeam.com, please? I’ll make sure to anonymize any identifying factors.

 

 

On it’s way.

Userlevel 5
Badge

Unfortunately the support case was closed without finding the solution. Support insisted that it was file system corruption and I’m convinced that can’t be the case since integrity streams are enabled on the backup files and no corruption is being reported in the eventlogs. I can copy the backup files fine, even the one which has the crc error. When there’s an integrity stream violation, Windows would throw an error during this copy operation. So to me this looks like the crc error occurred during the creation of the backup file, but I wasn’t able to convince support on that and they insisted it was file system corruption. Hence, I decided to close the case since it was leading nowhere,

Userlevel 7
Badge +14

Have you tried to escalate your support case? If you disagree with the solution, support should continue analyzing the issue. Although I wouldn't know what issue from Veeam's side would cause a single corrupted backup file.

I've seen cases where a bad SFP module (FC) caused corrupted backup files during their creation. So maybe you should check the whole environment.

And maybe you could create a seperate job for this VM with compression/deduplication disabled to rule out any processor or memory issues?

Userlevel 5
Badge

no I didn’t. 3 engineers picked up the case after each other and they all went down the same road. So I was quite done with it. We have 23 backup jobs and 45 copy jobs running and this single job had one crc error. The server (Proliant DL380 gen10 plus) has ECC memory, so a RAM issue could also be ruled out I guess. So I really doubt a hardware issue could be the problem or else we would have seen it on much more jobs. Health checks on all jobs also pass without issues. After creating a new active full I haven’t had the issue again so I treat it as a one time occurrence for now. I’m glad Veeam does checksum checks again when reading the files, so that’s a positive outcome. ;-)

 

My point merely is that support pointed right to filesystem corruption, because that’s the most probable cause of this error. However, they refused to look at other potential causes even though I provided evidence that rules out files system corruption.

Userlevel 7
Badge +14

Ok, so it happend only once. I thought at first that you would see corruptions regularly with that VM. Probably analyzing isn't easy if the issue itself isn't reproducible.

ReFS would only let you know if corruptions occur after creation of a backup file. But if it happens during the creation, it wouldn't notice anything.

But you're right, good to see that Veeam will discover such corruptions and not silently copy them over.

Comment