ReFS issues with latest Windows Server Updates (KB5009624, KB5009557, KB5009555)



Show first post

106 comments

Userlevel 7
Badge +13

I read that ReFS when they get up in space used it can fall over. Is that true, or still true?

Never heard of this, let’s see what community have to say about it.

Userlevel 7
Badge +13

Okay, so the only way to get to the new ReFS version is to create a new disk and copy the data from the 1.2 to the 3.x volume? There’s no way to manually upgrade to the new ReFS version?

When you copy the data from volume to volume you will loose all block cloning savings….

True, but I can’t see other solutions...

Userlevel 1

Hello all, just wanted to throw my 2 cents in. Please delete or otherwise if not ok.

This is a thread i’ve seen as having an active conversation about these issues.

I don’t use Veeam, but MABS v3.

 

My server is a Dell R720 with one RAID array, containing 3 volumes; OS (NTFS), Recovery (NTFS), Data (ReFS). Windows Server 2019 Standard.

 

Up until the 20th Jan this server was operating fine (at least as far as I could tell, I was using it).

It installed KB5009557 automatically on 21st @ 4am, and since then, it started BSOD during boot, with:

SYSTEM THREAD EXCEPTION NOT HANDLED

ReFS.SYS

Checking the MEMORY.DMP file, tells this error to have code 0x0000007e.

 

What was stranger was I couldn’t boot from a Server 2019 ISO, as this also BSOD.

Used a Server 2016 ISO to boot, and rename refs.sys to refs.sys.bak, which allowed me to boot windows, without ReFS support, so the volume showed as RAW.

 

Created a support case with Microsoft, more on that later.

 

Found this article, and discovered the OOB update KB5010791.

Tricky to install it, as I needed refs.sys in place to be successful, but had to rename it back to boot windows. It did update the file though; 10.0.17763.2458.

But still BSOD.

 

Used refsutil.exe to assess the volume. Looked all fine:

PS C:\windows\system32> refsutil salvage -QS F: C:\refsutil
Microsoft ReFS Salvage [Version 10.0.11070]
Copyright (c) 2015 Microsoft Corp.

Local time: 1/24/2022 9:53:14

ReFS version: 3.4
Boot sector checked.
Cluster Size: 4096 (0x1000).
Cluster Count: 3265642496 (0xc2a5c000).
Superblocks checked.
Checkpoints checked.
4558 container table entry pages processed (0 invalid page(s))
1 container index table entry pages processed (0 invalid page(s)).
Container Table checked.

Processing 1403 of 1404 object table pages (99%)...

Object Table checked.

Examining identified metadata disk data for versioning and consistency.
134777 disk clusters analyzed (100%)...

Examining volume with signature 45246377 for salvageable files.
4558 container table entry pages processed (0 invalid page(s)).
1 container index table entry pages processed (0 invalid page(s)).
Validating discovered table roots on volume with signature 45246377.

36926 table roots validated (100%).
Enumerating files from discovered tables on volume with signature 45246377.

36926 tables enumerated (100%).
Command Complete.

Run time = 611 seconds.

Tried uninstalling both KBs.

I also tried manually copying a previous version of refs.sys. 10.0.17763.2452, 10.0.17763.2330. Still BSOD.

 

After waiting for Microsoft support to give me something useful, for a week since creating the case, finally took things more into my own hands.

Used refsutil to copy anything that might be useful to keep, deleted the ReFS volume, reinstated refs.sys, rebooted, and the BSOD went away.

Updated fully with both KB5009557 + KB5010791. Still no BSOD.

Recreated the ReFS volume, configured MABS to use it again. Still no BSOD.

 

Microsoft support have been about as helpful as a wet towel. Giving me basic instructions that I’ve already tried myself (sfc, dism, uninstall kb). They did get my MEMORY.DMP and claimed to be analysing it since, but so far have only given me the stop code 0x7e, which I’ve also got myself. Slow to respond to emails, and even ‘strongly suggest’ I contact Microsoft Premier Support Services. No, not spending more money on MS. Tried to get them to confirm if ReFS and Fixed Disks were a known issue they were working on; no word on that.

 

So this leaves me wondering.

Did something happen to my volume, a corruption maybe, that then broke refs.sys?

Or did their update, break something in my volume, which then broke refs.sys?

I was using this volume for other things (a window share), so I’ve now split those workloads out to separate NTFS volumes, but surely that can’t be it.

I read that ReFS when they get up in space used it can fall over. Is that true, or still true?

Userlevel 5
Badge

Yep, but this specific volume fortunately doesn’t contain any Veeam backup data.

Userlevel 7
Badge +17

Okay, so the only way to get to the new ReFS version is to create a new disk and copy the data from the 1.2 to the 3.x volume? There’s no way to manually upgrade to the new ReFS version?

When you copy the data from volume to volume you will loose all block cloning savings….

Userlevel 5
Badge

Okay, so the only way to get to the new ReFS version is to create a new disk and copy the data from the 1.2 to the 3.x volume? There’s no way to manually upgrade to the new ReFS version?

Userlevel 2

My understanding is that only older v3.x versions update to 3.y   eg 3.1 to 3.4   or even to 3.7 now
New volumes get the relevant 3.y version
Older disks that may have been set up when originally attached to WS2012R2 will remain v1.2  and can still be used in the later OS (the OS has a separate refsv1 driver to handle them).
But some customers say they are not using old/existing disks and yet they are v1.2 somehow, and not sure how that can be.


 

Userlevel 5
Badge

Yeah, but we for sure didn't format it with ReFSv1. How can I convert it to a higher version so that we don't have the issue any more after installing the patch?

 

I'm going to detach the disk from this specific Win 2019 server and attach it to another Win 2019 server and see if it's upgraded.

 

EDIT: sigh, attaching the disk to another Windows 2019 VM doesn't work. The volume isn't automatically upgraded to a ReFS version higher than 1.2. Adding the devices.hotplug parameter is the only solution for now.

Userlevel 7
Badge +13

Yep, but question is why isn't it upgraded to the latest version automatically? I didn't format it with the ReFSv1 paramter as specified on the github article. Is there a way to manually upgrade it to the latest version?

The 1.2 is default version if formatted by Windows 8.1, Windows 10 up to v1607, Windows Server 2012 R2 and only if specified ReFSv1 on Windows Server 2016. I think it was the case.

In theory once a ReFS volume is mounted on a device that supports a newer ReFS version, it will be automatically upgraded to last possibly version. 

WARNING: after update, you can’t go back.

Userlevel 5
Badge

Yep, but question is why isn't it upgraded to the latest version automatically? I didn't format it with the ReFSv1 paramter as specified on the github article. Is there a way to manually upgrade it to the latest version?

Userlevel 7
Badge +13

Can people on here please confirm if they are using Refs on VMWare VM’s   (where the hotplug feature makes the drives appear as removable)
Vmware fix to overcome that is https://kb.vmware.com/s/article/1012225?lang=en_us

Or are some of your system really using removable drives?

For WS2012R2 using refsv1  the OOB fix will not and never will overcome the issue on removable drives
The Microsoft OOB fix is only for refs 3.x on removable drives in ws2016 and later
NOte - some systems running later OS’s may be using old disk formatted with refs v1  and so those refsv1 disks will be affected if they are considered removable

 

@stephc_msft  When running the fsutil command on our affected server it returns:

 

REFS Volume Serial Number :       0xb660b96e60b935c9
REFS Version   :                  1.2
Number Sectors :                  0x0000000004fe0000
Total Clusters :                  0x000000000009fc00
Free Clusters  :                  0x0000000000049906
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       512
Bytes Per Cluster :               65536
Checksum Type:                    CHECKSUM_TYPE_NONE

So it looks like we have the old ReFS version although the system is running Windows Server 2019. Why is this volume still on the old version, shouldn't it be upgraded automatically? Other Windows 2019 machines which don't have the issue have ReFS version 3.4.

It should be at least 3.4 accordingly to

https://gist.github.com/0xbadfca11/da0598e47dd643d933dc

Userlevel 5
Badge

Can people on here please confirm if they are using Refs on VMWare VM’s   (where the hotplug feature makes the drives appear as removable)
Vmware fix to overcome that is https://kb.vmware.com/s/article/1012225?lang=en_us

Or are some of your system really using removable drives?

For WS2012R2 using refsv1  the OOB fix will not and never will overcome the issue on removable drives
The Microsoft OOB fix is only for refs 3.x on removable drives in ws2016 and later
NOte - some systems running later OS’s may be using old disk formatted with refs v1  and so those refsv1 disks will be affected if they are considered removable

 

@stephc_msft  When running the fsutil command on our affected server it returns:

 

REFS Volume Serial Number :       0xb660b96e60b935c9
REFS Version   :                  1.2
Number Sectors :                  0x0000000004fe0000
Total Clusters :                  0x000000000009fc00
Free Clusters  :                  0x0000000000049906
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       512
Bytes Per Cluster :               65536
Checksum Type:                    CHECKSUM_TYPE_NONE

So it looks like we have the old ReFS version although the system is running Windows Server 2019. Why is this volume still on the old version, shouldn't it be upgraded automatically? Other Windows 2019 machines which don't have the issue have ReFS version 3.4.

Userlevel 5
Badge

The only solution for 2012R2 (running on VMWare) is the 
devices.hotplug with a value of false

mentioned in Disabling the HotAdd/HotPlug capability in virtual machines (1012225) (vmware.com)

 

This setting indeed solves the issue for us on a Windows Server 2019 machine. Now the ReFS volume is accessible again after installing the updates. However, other VMs which still have accessible ReFS volumes after installing the patches don't have this configuration setting and they also have the eject option on the taskbar for the virtual disk. So, although the devices.hotplug setting fixes the issue, it's not the whole story.

 

Thanks @stephc_msft ! Questionable though why MS Support isn't aware of this setting.

Userlevel 7
Badge +14

Thanks @stephc_msft for jumping into the discussion. I’m wondering if disabling hotadd fixes the issue. Does this mean for a virtualized Windows it looks like the disk is external, with hotadd enabled?

@Everyone: If you set this configuration with a virtual backup server, probably hotadd transport mode will no longer work ?

Userlevel 7
Badge +13

Tonight Microsoft start releasing “fix for everything”:


https://www.catalog.update.microsoft.com/Search.aspx?q=%092022-01%20Cumulative%20Update%20Preview

Userlevel 2

The only solution for 2012R2 (running on VMWare) is the 
devices.hotplug with a value of false

mentioned in Disabling the HotAdd/HotPlug capability in virtual machines (1012225) (vmware.com)

 

Userlevel 7
Badge +13

Nice share of infos guys, this is the power of this community.

Userlevel 2

Our 2012r2 machine I’ve been referencing is on VMware. Host is running 6.7.

Userlevel 5
Badge

Yep

Userlevel 2

And still having the issue after applying the OOB update as well ?

Userlevel 5
Badge

We are running vSphere 7 and have the issue on two VMs. This drive is shown as fixed. The VM runs Windows server 2022.

Userlevel 2

Can people on here please confirm if they are using Refs on VMWare VM’s   (where the hotplug feature makes the drives appear as removable)
Vmware fix to overcome that is https://kb.vmware.com/s/article/1012225?lang=en_us

Or are some of your system really using removable drives?

For WS2012R2 using refsv1  the OOB fix will not and never will overcome the issue on removable drives
The Microsoft OOB fix is only for refs 3.x on removable drives in ws2016 and later
NOte - some systems running later OS’s may be using old disk formatted with refs v1  and so those refsv1 disks will be affected if they are considered removable

fsutil fsinfo refsinfo x:   will show the refs version
although of course cant run taht if the volume is already showing RAW!

KB5010691: ReFS-formatted removable media may fail to mount or mounts as RAW after installing the January 11, 2022 Windows updates (microsoft.com)
Mentions the OOB fix but unfortunately doesnt link to them
Mentions  applicable to refs v2  (aka 3.x)
Unfortunately doesnt mention the vmware aspect  or the 2012r2/refsv1 aspect

Userlevel 7
Badge +20

Latest update from MS Support: case is being archived with no resolution at the moment. Advice currently is, if the issue occurs after installing the cumulative patch, you have two options:

- uninstall the patch and copy the data to an NTFS volume and re-install the patch

- refrain from installing the patch entirely

The MS Support engineer even made a personal note:

In any case if you use NTFS I personally recommend using NTFS instead of ReFS, as ReFS is still immature as a file system as has several bugs.


I did let them know my frustration about this issue, since they broke it and still don't have a solution or even a cause after several weeks.

This response is… wow :scream: . It makes a very uncomfortable feeling at the moment.

In this case they should drop ReFS completely.

Agreed, this engineer is definitely overstepping by presenting their own personal views in such a way. Likely out of frustration as Microsoft will have been the ones stating to just uninstall the patch.

 

When Microsoft promised us singular “cumulative” updates to improve their QA, this is a trade off we now are experiencing, that you get all or nothing with their patches, production breaking updates vs exposure to zero day exploits is now the choice to make.

 

The ReFS comment makes me feel uneasy also, but this is why we need to plan for file system issues in our 3-2-1-1-0, as this patch, whilst bad, has the ability to be reversed. What happens next time when the patch corrupts the partitions beyond repair?

 

We need to treat this as an important lesson, and be glad that Veeam have been helping us break free of Microsoft’s chains with the use of Linux operating systems.

 

If I’m allowed to dream, I’d love to see XFS on What nodes, giving Microsoft some actual competition.

Userlevel 5
Badge

Yep, uninstalling the patch restores access. That specific machine is running Windows Server 2022.

Userlevel 7
Badge +13

“In any case if you use NTFS I personally recommend using NTFS instead of ReFS, as ReFS is still immature as a file system as has several bugs.”

Wow.

So @Franc if you uninstall that update you can access to data?

Comment