ReFS issues with latest Windows Server Updates (KB5009624, KB5009557, KB5009555)



Show first post

106 comments

Userlevel 1

I read that ReFS when they get up in space used it can fall over. Is that true, or still true?

Never heard of this, let’s see what community have to say about it.

Yeah I just re-found the article. ServerFault. May 2019. “When ReFS volumes get filled up to 60-80% they tend to lock up. If you have ReFS user data hashing enabled situation turns even worse.”

So I guess people have gone over that with no issues … ?

Userlevel 7
Badge +20

KB5009555 for Windows 2022 doesn’t seem to have been pulled. My test server still downloads this update from Windows Update. However, this is the faulty update which causes the raw ReFS issue.

That’s because Microsoft released them AGAIN on a FRIDAY 🤯

 

https://www.bleepingcomputer.com/news/microsoft/microsoft-resumes-rollout-of-january-windows-server-updates/

Userlevel 7
Badge +20

All they’ve done is add the problems as known issues… oh Microsoft…

What else should they do...fix the update?🤣

That’s crazy talk! What next, QA testing?

Userlevel 7
Badge +12

All they’ve done is add the problems as known issues… oh Microsoft…

What else should they do...fix the update?🤣

That’s crazy talk! What next, QA testing?

You say it. We get the updates for free, so the least we can do is giving something back to Microsoft; fixing bugs, QA, …

 

Userlevel 7
Badge +12

The first problem is, that we all are probably at a very high knowledge level; perhaps just a bit below or same as the premier support. So we have already tried everything and hope that they'll be able to do any wonders. And second, regarding bugs, they won't have any better insights or connections. So you'll need to spend very much time and have luck, to really get to the point where someone can help you. And at that point the problem has either magically disappeared (I look at you Windows Update) or you've setup a fresh system because you couldn't wait. Don't get me wrong, the premier support people are great and very knowledgeable, but like I said they're not magicians

 

Userlevel 7
Badge +12

Microsoft has released an out-of-band which should resolve all the problems: https://support.microsoft.com/en-au/topic/january-17-2022-non-security-update-kb5010796-out-of-band-e79a633f-e876-4268-a21e-de6a9ca52da7

According to @Franc it didn't solve his ReFS problem, so please still be careful.

Userlevel 5
Badge

As @regnor stated, this update doesn’t solve the RAW issue on fixed ReFS drives for us. I’ll open a support case with Microsoft for it. Although I don’t get my hopes up too high, since my experience with MS Support lately is not that great to say the least.

Userlevel 7
Badge +12

FYI the out of bound update, KB5010794, is still breaking ReFS for 2012r2.

Probably you really need to install both updates, one after the other. Although it’s also strange that the out of band update also introduces the issue…

By the way, just for myself, why did you decide to go with ReFS on 2012R2?

Userlevel 7
Badge +20

FYI the out of bound update, KB5010794, is still breaking ReFS for 2012r2.

Probably you really need to install both updates, one after the other. Although it’s also strange that the out of band update also introduces the issue…

By the way, just for myself, why did you decide to go with ReFS on 2012R2?

I’m not the person you asked the question to but I thought I’d jump in with my experience. The reason I jumped on the bandwagon early was because Microsoft literally named it “Resilient File System”, they touted automatic detection and repair of corruption as a major reason to go with it, plus the scaling side of ReFS for maximum volume and file size limits seemed better aligned to the constant marketing of data growth explosions.

 

Anything that keeps my backups healthy sounds good to me!

Userlevel 5
Badge

@regnor , well that was quick with MS support. They confirmed it's also an issue with fixed drives, but the patch from yesterday was only for external drives. They are still working on a fix for fixed drives and he will inform me once the patch is available. He couldn't explain though why Microsoft doesn't mention the issue for fixed drives anywhere, but only for external drives. He confirmed he already had multiple cases where other customers also experience the issue with fixed drives.

Userlevel 5
Badge

Latest update from MS Support: case is being archived with no resolution at the moment. Advice currently is, if the issue occurs after installing the cumulative patch, you have two options:

- uninstall the patch and copy the data to an NTFS volume and re-install the patch

- refrain from installing the patch entirely

The MS Support engineer even made a personal note:

In any case if you use NTFS I personally recommend using NTFS instead of ReFS, as ReFS is still immature as a file system as has several bugs.


I did let them know my frustration about this issue, since they broke it and still don't have a solution or even a cause after several weeks.

Userlevel 5
Badge

We are running vSphere 7 and have the issue on two VMs. This drive is shown as fixed. The VM runs Windows server 2022.

Userlevel 2

And still having the issue after applying the OOB update as well ?

Userlevel 5
Badge

Yep

Userlevel 2

Our 2012r2 machine I’ve been referencing is on VMware. Host is running 6.7.

Userlevel 7
Badge +17

Up to now I had no bad experience with ReFS and Windows Server 2016 and 2019. I am using ReFS with local disks and iSCSI targets….

Userlevel 7
Badge +12

@Mildur But so far we havn’t experienced any good things since 12/31/2021 from them; except you count everything which is not broken or doesn’t have issues as positive :wink:

In Microsoft must change devs, they've been doing everything wrong lately.

The problem aren’t the devs but rather the changed/missing quality control/assurance. This is an organizational problem, which the devs probably can’t do much about. :no_mouth:

Userlevel 7
Badge +9

@Mildur But so far we havn’t experienced any good things since 12/31/2021 from them; except you count everything which is not broken or doesn’t have issues as positive :wink:

 

True. I was talking more general. Not only in 2022, and not only from Microsoft.

Bad things are more discussed than good things :)

 

But too many bad things this month, such as

  • Windows KB5009543, KB5009566 updates break L2TP VPN
  • New critical Windows HTTP vulnerability is wormable etc…

Therefore, I agree with @regnor assertion on this issue! but like I said before, they are still not recognised by Microsoft at this moment of writing this comment: https://msrc.microsoft.com/update-guide/ 

Userlevel 7
Badge +12

@TKA Just like @JMeixner I've only had good experiences with ReFS. The only time it failed was because if controller/SFP problems. Perhaps in your case the volume was ok and only turned RAW because of the updates mentioned in this topic.

@dloseke Are those virtual disks or physical volumes?

Userlevel 5
Badge

Yep, but question is why isn't it upgraded to the latest version automatically? I didn't format it with the ReFSv1 paramter as specified on the github article. Is there a way to manually upgrade it to the latest version?

Userlevel 7
Badge +20

Thanks for sharing. Passed this along to our ops team as today is patching day. Make sure these are not applied to our ReFS servers that are repos. Good old MS.

Userlevel 2

My understanding is that only older v3.x versions update to 3.y   eg 3.1 to 3.4   or even to 3.7 now
New volumes get the relevant 3.y version
Older disks that may have been set up when originally attached to WS2012R2 will remain v1.2  and can still be used in the later OS (the OS has a separate refsv1 driver to handle them).
But some customers say they are not using old/existing disks and yet they are v1.2 somehow, and not sure how that can be.


 

Userlevel 7
Badge +17

Okay, so the only way to get to the new ReFS version is to create a new disk and copy the data from the 1.2 to the 3.x volume? There’s no way to manually upgrade to the new ReFS version?

When you copy the data from volume to volume you will loose all block cloning savings….

Userlevel 5
Badge

Yep, but this specific volume fortunately doesn’t contain any Veeam backup data.

Userlevel 1

Hello all, just wanted to throw my 2 cents in. Please delete or otherwise if not ok.

This is a thread i’ve seen as having an active conversation about these issues.

I don’t use Veeam, but MABS v3.

 

My server is a Dell R720 with one RAID array, containing 3 volumes; OS (NTFS), Recovery (NTFS), Data (ReFS). Windows Server 2019 Standard.

 

Up until the 20th Jan this server was operating fine (at least as far as I could tell, I was using it).

It installed KB5009557 automatically on 21st @ 4am, and since then, it started BSOD during boot, with:

SYSTEM THREAD EXCEPTION NOT HANDLED

ReFS.SYS

Checking the MEMORY.DMP file, tells this error to have code 0x0000007e.

 

What was stranger was I couldn’t boot from a Server 2019 ISO, as this also BSOD.

Used a Server 2016 ISO to boot, and rename refs.sys to refs.sys.bak, which allowed me to boot windows, without ReFS support, so the volume showed as RAW.

 

Created a support case with Microsoft, more on that later.

 

Found this article, and discovered the OOB update KB5010791.

Tricky to install it, as I needed refs.sys in place to be successful, but had to rename it back to boot windows. It did update the file though; 10.0.17763.2458.

But still BSOD.

 

Used refsutil.exe to assess the volume. Looked all fine:

PS C:\windows\system32> refsutil salvage -QS F: C:\refsutil
Microsoft ReFS Salvage [Version 10.0.11070]
Copyright (c) 2015 Microsoft Corp.

Local time: 1/24/2022 9:53:14

ReFS version: 3.4
Boot sector checked.
Cluster Size: 4096 (0x1000).
Cluster Count: 3265642496 (0xc2a5c000).
Superblocks checked.
Checkpoints checked.
4558 container table entry pages processed (0 invalid page(s))
1 container index table entry pages processed (0 invalid page(s)).
Container Table checked.

Processing 1403 of 1404 object table pages (99%)...

Object Table checked.

Examining identified metadata disk data for versioning and consistency.
134777 disk clusters analyzed (100%)...

Examining volume with signature 45246377 for salvageable files.
4558 container table entry pages processed (0 invalid page(s)).
1 container index table entry pages processed (0 invalid page(s)).
Validating discovered table roots on volume with signature 45246377.

36926 table roots validated (100%).
Enumerating files from discovered tables on volume with signature 45246377.

36926 tables enumerated (100%).
Command Complete.

Run time = 611 seconds.

Tried uninstalling both KBs.

I also tried manually copying a previous version of refs.sys. 10.0.17763.2452, 10.0.17763.2330. Still BSOD.

 

After waiting for Microsoft support to give me something useful, for a week since creating the case, finally took things more into my own hands.

Used refsutil to copy anything that might be useful to keep, deleted the ReFS volume, reinstated refs.sys, rebooted, and the BSOD went away.

Updated fully with both KB5009557 + KB5010791. Still no BSOD.

Recreated the ReFS volume, configured MABS to use it again. Still no BSOD.

 

Microsoft support have been about as helpful as a wet towel. Giving me basic instructions that I’ve already tried myself (sfc, dism, uninstall kb). They did get my MEMORY.DMP and claimed to be analysing it since, but so far have only given me the stop code 0x7e, which I’ve also got myself. Slow to respond to emails, and even ‘strongly suggest’ I contact Microsoft Premier Support Services. No, not spending more money on MS. Tried to get them to confirm if ReFS and Fixed Disks were a known issue they were working on; no word on that.

 

So this leaves me wondering.

Did something happen to my volume, a corruption maybe, that then broke refs.sys?

Or did their update, break something in my volume, which then broke refs.sys?

I was using this volume for other things (a window share), so I’ve now split those workloads out to separate NTFS volumes, but surely that can’t be it.

I read that ReFS when they get up in space used it can fall over. Is that true, or still true?

Comment