Solved

A data integrity checksum error occurred. Data in the file stream is corrupt. - Synology NAS, REFS, RDM

  • 14 October 2022
  • 6 comments
  • 2744 views

Userlevel 7
Badge +6

Had a client contact me this morning that is getting corruption errors on their backup copy jobs to the performance tier for their SOBR, but data appears to be making it out to their capacity tier, Wasabi, as far as they know (I didn’t check available restore points on that tier, but for sure the data is not sitting on the performance tier when I look at that filesystem).

Background configuration is that they are using in both their primary and secondary datacenters a Synology NAS that is connected to their Windows Repo server as an RDM disk presented via ISCSI to their ESXI hosts, volume is formatted REFS, 64k blocks, the usual.  I realize that NAS’s are less than desirable, and my standard procedure is now to no longer use REFS when using a NAS as the backing array, but this is where we’re at.  I will point out that another copy job, not going to a SOBR but to the same NAS is copying data successfully.  I can’t say for sure off the top of my head if it’s the same volume or if this is going to a separate volume as they do have two volumes presented from this array to the repo server.

I suggested we open a case with support and the client is also going to contact Synology support as they just upgraded the firmware on their NAS’s last weekend which seems to be about when the issues started, but I didn’t know if anyone else had seen any similar issues and if anyone had any good ways to check for data corruption on the filesystem.  Appears the usual tools don’t work on REFS volumes because more or less, REFS isn’t supposed to get corrupted….lol.

icon

Best answer by regnor 14 October 2022, 21:07

View original

6 comments

Userlevel 7
Badge +20

I have seen Synology do some weird things with iSCSI connections and it even killed my LUNs on the first upgrade to v7.0 based on changes they made to the iSCSI engine.  So, contacting support is the right way to go at this point.

Be sure to set up snapshots for iSCSI LUNs on Synology as they can save your bacon.  😁

Userlevel 7
Badge +6

Pretty sure we don’t have snapshots setup.  They wanted all the space…..but yeah, that’s something I do on primary storage at least.

Userlevel 7
Badge +17

Did they do a firmware update on this NAS before the problems started? Have heard several times that this could cause these checksum error afterwards… Most time the clients have thrown out the NAS after this has happened….

Userlevel 7
Badge +6

Did they do a firmware update on this NAS before the problems started? Have heard several times that this could cause these checksum error afterwards… Most time the clients have thrown out the NAS after this has happened….

Yes, I did note a firmware update on both of their NAS’s, but one did give them some issues getting the update applied...apparently it was complaining about drive space issues.  But I haven’t heard of this happening before.  That said, I checked to see if my own had firmware updates and checked the release notes, but that wasn’t terribly helpful.  That said, if you’ve heard of firmware update’s causing similar issues, that is a striking similarity for sure.  I did have them confirm as well that they had disabled the copy jobs prior to applying the updates, but that seemed to be a less likely culprit.

Userlevel 7
Badge +14

Contacting Veeam support seems the right way; perhaps they can repair anything. Do you also see ReFS errors in the eventlog?

I've been fortunate enough to never see such corruptions happen. ReFS does need reliable write operations. Most NAS devices only use software RAID without a proper battery cache. So a power fault, software error or perhaps a firmware update could make some write IOs go lost.

Userlevel 7
Badge +6

LMAO…..the Best Answer button needs to not be where the like button normally is.  But close enough..lol.  Agreed though…..the NAS is not the best solution, REFS is….hit and miss I think.  I’ll wait for the support ticket to see if it can be isolated.  My best guess is that it is firmware related as well, but I’m not sure why the data is plain missing when looking at the filesystem.  I’ll report back.

 

Comment