ReFS issues with latest Windows Server Updates (KB5009624, KB5009557, KB5009555)


Userlevel 7
Badge +14

Just a quick post; the latest Windows Server updates for 2012R2, 2019 and 2022 (haven’t seen 2016) can cause ReFS issues. After the installation, ReFS volumes are shown as RAW and are no longer accessible; so if you’re using ReFS for your repositories, then take be aware of this when installating the update. Besides that, those updates can also cause bootloops for domain controllers and break Hyper-V.

Depending on your Windows Server version one of the following updates could have caused the issue: KB5009624, KB5009557, KB5009555

If you’re affected then removing the mentioned update should solve the issue. Don’t (!) try to repair the ReFS volume, because this could cause a dataloss.

This should remind everyone why 3-2-1 for Backups is so important. :wink:

Further information:

https://www.bleepingcomputer.com/news/microsoft/new-windows-server-updates-cause-dc-boot-loops-break-hyper-v/

https://forums.veeam.com/veeam-backup-replication-f2/beware-possible-raw-refs-volumes-after-installing-january-updates-t78634.html

Update #1:

Microsoft has pulled the updates, thanks @Mildur 

https://www.bleepingcomputer.com/news/microsoft/microsoft-pulls-new-windows-server-updates-due-to-critical-bugs/amp/

Update #2:

Microsoft has released the updates again and it looks like they didn't fix them. (Thanks @MicoolPaul )

https://www.bleepingcomputer.com/news/microsoft/microsoft-resumes-rollout-of-january-windows-server-updates/

Update #3:

Microsoft has released an out-of-band update, which can possibly resolve the issues:

https://www.bleepingcomputer.com/news/microsoft/microsoft-releases-emergency-fixes-for-windows-server-vpn-bugs/

 


106 comments

Userlevel 7
Badge +8

You guys scared me bringing back this post. I just saw the title haha

Userlevel 7
Badge +6

 

@dlosekeAre those virtual disks or physical volumes?

 

Excellent question!  In both cases, these are RDM disks and the underlying storage is an ISCSI volume presented to the ESXI hosts from a Synology NAS.

Try to use always iSCSI volumes inside the Windows VM and not to the ESXi host and use then RDM disks. It functions better in my experience and less dependent of things like VMware.

 

I suppose that’s true.  I’ve been using RDM’s because I have better multipathing capabilities, but I suppose that could be worked out with multiple NIC’s on the VM tied to certain port groups that are dedicated to certain physical NIC’s maybe?  I guess I haven’t given it much consideration.  I will say that I hate the ISCSI initiator in Windows and the MPIO driver, but that’s just personal preference...the VMware ISCSI is easier to setup IMO.  But that doesn’t make it better either….

Userlevel 7
Badge +11

 

@dlosekeAre those virtual disks or physical volumes?

 

Excellent question!  In both cases, these are RDM disks and the underlying storage is an ISCSI volume presented to the ESXI hosts from a Synology NAS.

Try to use always iSCSI volumes inside the Windows VM and not to the ESXi host and use then RDM disks. It functions better in my experience and less dependent of things like VMware.

Userlevel 7
Badge +11

From Windows Server 2016 it is recommended to use REFS. Normally there are no issues with it. Before W2016, yes indeed. The advantages with using REFS compared to NTFS are big : you can use synthetic full backups so allows you to have much more restore points on the same size of storage and is much more faster because pointers are being used to blocks being identical that are already located on the storage. I would never go backup to NTFS except when using rotating USB disks, then I recommended to use NTFS over REFS because using GFS is  not possible with the rotated option.

Userlevel 7
Badge +6

 

@dlosekeAre those virtual disks or physical volumes?

 

Excellent question!  In both cases, these are RDM disks and the underlying storage is an ISCSI volume presented to the ESXI hosts from a Synology NAS.

Userlevel 7
Badge +14

@TKA Just like @JMeixner I've only had good experiences with ReFS. The only time it failed was because if controller/SFP problems. Perhaps in your case the volume was ok and only turned RAW because of the updates mentioned in this topic.

@dloseke Are those virtual disks or physical volumes?

Userlevel 7
Badge +6

What OS?
If 2012R2 and if on a system like vmware where the disks have characteristics of removable, you will alway have issues.
With the patches removed, and the volume visible, can you check the refs version number
fsutil fsinfo refsinfo x:
If its v1.2   then that is the old version which will never work on ‘removable’ drives now
v3.x should be ok with latest updates (since Feb)

If this is a vmware system, and refs v1.2 disks, then the only option is a vmware configuartion chnage to disable hotplug

Disabling the HotAdd/HotPlug capability in virtual machines (vmware.com)


If none of the above, please see if the ReFS event log (under applicationas and services logs) says anything   eg about the version number
eg if somehow the v3.x refs disk have been attached to a later OS and updated to a later v3.x version, then they will no longer be readable if put them back on the earlier OS
 

 

Funny that this popped up again.  A couple days ago I ran across another 2012 R2 machine that was showing the REFS volumes as RAW and disabling hotplug worked perfectly without having to uninstall the offending KB that was applied to the server.  In the past, I had a client that this happened to a couple of times and I did both at the time so wasn’t sure which fixed it.  Can confirm disabling hotplug worked perfectly!

Userlevel 7
Badge +17

Up to now I had no bad experience with ReFS and Windows Server 2016 and 2019. I am using ReFS with local disks and iSCSI targets….

Userlevel 4

@TKAThe storage or RAID controller needs to be certified by Microsoft or listed in the Windows Server catalog. In general you should at least have a battery backed RAID controller in place. Without it a powerless could corrupt your ReFS volume. I'm not sure about iSCSI but would say if the storage is descent it shouldn't be an issue.

For your case; what storage do you have? Is it a NAS?

Now the customer has a “new backup server” with WS2019 on DELL R720 with 10 GbE ethernets. The backup target is Storage Array - Infotrend DS1024 (dual controller, 4 GB RAM per ctl with BBU). But from my bad experience with Refs and lose around 60TB + of backup, we are still on NTFS :) now we ordered a new array with capacity 180 TB, so maybe is time to change to ReFS for “performance tier” 60 TB and 120 TB for ObjectStorage volume as “archive” .) 

Userlevel 7
Badge +14

@TKA The storage or RAID controller needs to be certified by Microsoft or listed in the Windows Server catalog. In general you should at least have a battery backed RAID controller in place. Without it a powerless could corrupt your ReFS volume. I'm not sure about iSCSI but would say if the storage is descent it shouldn't be an issue.

For your case; what storage do you have? Is it a NAS?

Userlevel 7
Badge +3

@regnor thank you for the heads up on this!!!

Userlevel 4

hi guys, i have very bad experiece with ReFS on WS2016, one of my customer has around 60TB+ of backups on ReFS volume and after some windows update and reboot, the volume was “RAW”. from this time i not using ReFS on 2012R2,2016,2019… i cannot belive it. maybe on WS2022 the support is better. In the past i wrote some post, that ReFS is not suported for storage arrays, which are connected via iSCSI. its true with 2022 ? 

tom.

Userlevel 2

What OS?
If 2012R2 and if on a system like vmware where the disks have characteristics of removable, you will alway have issues.
With the patches removed, and the volume visible, can you check the refs version number
fsutil fsinfo refsinfo x:
If its v1.2   then that is the old version which will never work on ‘removable’ drives now
v3.x should be ok with latest updates (since Feb)

If this is a vmware system, and refs v1.2 disks, then the only option is a vmware configuartion chnage to disable hotplug

Disabling the HotAdd/HotPlug capability in virtual machines (vmware.com)


If none of the above, please see if the ReFS event log (under applicationas and services logs) says anything   eg about the version number
eg if somehow the v3.x refs disk have been attached to a later OS and updated to a later v3.x version, then they will no longer be readable if put them back on the earlier OS
 

Userlevel 7
Badge +14

I haven't heard any bad news since some time, so it will be interesting to see which update is causing your problem.

Badge

Apparently, this is still an issue. I had 6 updates automatically applied (KB5011564, KB5011560, KB5010462, KB5010419, KB5010395, KB5009595) I thought I had figured out which one was the culprit but I was mistaken and did not have the time to narrow it down so I removed all 6, rebooted, and the ReFS drive no longer showed as RAW. I do not find any of the KB’s that I listed above as matching the other’s in this thread, has anyone else experienced this and found out which one it was?

We have not applied any of these updates to our ReFS servers due to this problem.  I am surprised MS has not addressed it yet and we cannot have our Veeam services down.

We have been employing the same approach as you and these shouldn’t have been applied but someone else had a different idea. I will need to isolate which update caused it, thanks for the response.

No problem - we use a patch management system so we don’t approve those updates for the specific servers so they never get installed.

We do too, but another engineer decided to release them. It is what it is and won’t happen again if I have anything to say about it!

Userlevel 7
Badge +20

Apparently, this is still an issue. I had 6 updates automatically applied (KB5011564, KB5011560, KB5010462, KB5010419, KB5010395, KB5009595) I thought I had figured out which one was the culprit but I was mistaken and did not have the time to narrow it down so I removed all 6, rebooted, and the ReFS drive no longer showed as RAW. I do not find any of the KB’s that I listed above as matching the other’s in this thread, has anyone else experienced this and found out which one it was?

We have not applied any of these updates to our ReFS servers due to this problem.  I am surprised MS has not addressed it yet and we cannot have our Veeam services down.

We have been employing the same approach as you and these shouldn’t have been applied but someone else had a different idea. I will need to isolate which update caused it, thanks for the response.

No problem - we use a patch management system so we don’t approve those updates for the specific servers so they never get installed.

Badge

Apparently, this is still an issue. I had 6 updates automatically applied (KB5011564, KB5011560, KB5010462, KB5010419, KB5010395, KB5009595) I thought I had figured out which one was the culprit but I was mistaken and did not have the time to narrow it down so I removed all 6, rebooted, and the ReFS drive no longer showed as RAW. I do not find any of the KB’s that I listed above as matching the other’s in this thread, has anyone else experienced this and found out which one it was?

We have not applied any of these updates to our ReFS servers due to this problem.  I am surprised MS has not addressed it yet and we cannot have our Veeam services down.

We have been employing the same approach as you and these shouldn’t have been applied but someone else had a different idea. I will need to isolate which update caused it, thanks for the response.

Userlevel 7
Badge +20

Apparently, this is still an issue. I had 6 updates automatically applied (KB5011564, KB5011560, KB5010462, KB5010419, KB5010395, KB5009595) I thought I had figured out which one was the culprit but I was mistaken and did not have the time to narrow it down so I removed all 6, rebooted, and the ReFS drive no longer showed as RAW. I do not find any of the KB’s that I listed above as matching the other’s in this thread, has anyone else experienced this and found out which one it was?

We have not applied any of these updates to our ReFS servers due to this problem.  I am surprised MS has not addressed it yet and we cannot have our Veeam services down.

Badge

Apparently, this is still an issue. I had 6 updates automatically applied (KB5011564, KB5011560, KB5010462, KB5010419, KB5010395, KB5009595) I thought I had figured out which one was the culprit but I was mistaken and did not have the time to narrow it down so I removed all 6, rebooted, and the ReFS drive no longer showed as RAW. I do not find any of the KB’s that I listed above as matching the other’s in this thread, has anyone else experienced this and found out which one it was?

Userlevel 2

If still RAW after removing the Jan patch (patches?), then suggests there is some real refs corruption.
If on vmware try the disable hotplug method as a double check.
And/or try attaching the disk to an unpatched system (any OS version) as another check.

If real corruption is *probably* a co-incidence it happened at the time of patching.
There are some 3rd party tools that can scan disks/volumes and that can understand refs
eg

https://www.r-studio.com/

https://www.isobuster.com/

https://www.diskgenius.com/

For WS2019 their is a built-in refsutil tool  (which can also be copies to and run on WS2016, but doubt will work on WS2012R2

NOte - only do the recover attempt as a last resort, after confirming not hitting the know issues that make it appear RAW even though there is no corruption.

 

Anyone found a solution to solve this on Windows 2012 R2, REFS Volume showing RAW? I tried to remove bad patch, even installed out-of-band patch, still no luck. 

Userlevel 7
Badge +14

The first problem is, that we all are probably at a very high knowledge level; perhaps just a bit below or same as the premier support. So we have already tried everything and hope that they'll be able to do any wonders. And second, regarding bugs, they won't have any better insights or connections. So you'll need to spend very much time and have luck, to really get to the point where someone can help you. And at that point the problem has either magically disappeared (I look at you Windows Update) or you've setup a fresh system because you couldn't wait. Don't get me wrong, the premier support people are great and very knowledgeable, but like I said they're not magicians

 

Userlevel 7
Badge +20

Welcome to the Veeam Community @ianv. The community is open for everyone, even if your not a Veeam user :wink:

It’s hard to say what happend in your case and if you did experience something different. I think BSOD weren’t caused by this update, “only” the ReFS volumes turned inaccessible (RAW).

‘strongly suggest’ I contact Microsoft Premier Support Services.

I really love their support for such suggestions. They introduce an issue but won’t help you in anyway or admit that it could be a bug, without going through their premier support, which is by the way, not easily accessible at all.

I‘ve actually never had a successful resolution via Microsoft Premier Support, everything has always been out of scope or no root cause found, and I’ve always been given my credits back! 😆 I must need to get Microsoft Super Premier Justice League Support for that…

Userlevel 7
Badge +14

Welcome to the Veeam Community @ianv. The community is open for everyone, even if your not a Veeam user :wink:

It’s hard to say what happend in your case and if you did experience something different. I think BSOD weren’t caused by this update, “only” the ReFS volumes turned inaccessible (RAW).

‘strongly suggest’ I contact Microsoft Premier Support Services.

I really love their support for such suggestions. They introduce an issue but won’t help you in anyway or admit that it could be a bug, without going through their premier support, which is by the way, not easily accessible at all.

Userlevel 1

I read that ReFS when they get up in space used it can fall over. Is that true, or still true?

Never heard of this, let’s see what community have to say about it.

Yeah I just re-found the article. ServerFault. May 2019. “When ReFS volumes get filled up to 60-80% they tend to lock up. If you have ReFS user data hashing enabled situation turns even worse.”

So I guess people have gone over that with no issues … ?

Comment