ReFS vs NTFS (or other)


Userlevel 7
Badge +17

Hey all. I haven’t yet pulled the plug as far as deploying ReFS on any of my Repos. Main reason initially was due to lack of stability I saw quite often in its early days. Other reason is I didn’t want to go through the hassle of creating a new Repo, then spending days migrating data here/there/everywhere. My NTFS has worked and done pretty well. The only fast clone storage I have implemented currently is with my Hardened Repo. Have had no issues there, but then again..XFS is a pretty mature FS 😊 .

Anyway, I just want to ping the Community yet again to hear of any ‘gotchas’ which may be out there which may lead me to being *against* implementing ReFS in my Veeam environment. I’m on the verge of getting some shiney new Nimble arrays in for my backup storage, so when I get it connected, it would be a prime opportunity for me to implement ReFS if it is indeed now stable. I think my only concern maybe with doing restores, and rehydration of data? Do restores take a bit longer to do? Any info you all can provide would be greatly appreciated.

As always..thanks! 😊


29 comments

Userlevel 7
Badge +8

Yeah..it appears (fingers crossed), ReFS is pretty stable now. Though..when I begin getting my new arrays installed & switching things over, I think I’ll just be looking at going with XFS. I might still go with W2022 & ReFS..but mostly leaning towards going with XFS.

@Scott - agree. Would still be interesting to hear any ReFS horror...or actually non-horror (good) stories folks have experienced with ReFS.

I’m sitting on over PB of ReFS and have no issues, the benefits outweigh everything when it comes to Veeam.  You just can’t slap ReFS everywhere in the environment and hope it will be a benefit though.

 

I may pull the trigger on XFS and hardened repos next upgrade, or stick to my windows infrastructure. I need time to test XFS before I go that route though. 

Userlevel 7
Badge +17

Yeah..I got mine initially installed; but just need to add it to a VBR server to do the testing part

Userlevel 7
Badge +20

Yeah..it appears (fingers crossed), ReFS is pretty stable now. Though..when I begin getting my new arrays installed & switching things over, I think I’ll just be looking at going with XFS. I might still go with W2022 & ReFS..but mostly leaning towards going with XFS.

@Scott - agree. Would still be interesting to hear any ReFS horror...or actually non-horror (good) stories folks have experienced with ReFS.

We are definitely looking at XFS more now especially with the ISO for building the hardened repos.  That reminds me time to start testing and documenting things with this as we want to use it for our appliances, etc. 😎

Userlevel 7
Badge +17

Yeah..it appears (fingers crossed), ReFS is pretty stable now. Though..when I begin getting my new arrays installed & switching things over, I think I’ll just be looking at going with XFS. I might still go with W2022 & ReFS..but mostly leaning towards going with XFS.

@Scott - agree. Would still be interesting to hear any ReFS horror...or actually non-horror (good) stories folks have experienced with ReFS.

Userlevel 7
Badge +6

I think I've been lucky as been using ReFS for the last few years without any major dramas although we only have approx 500TB stored across our repo’s

Userlevel 7
Badge +8

I really think anyone who has had ReFS issues should post when. ReFS is leaps ahead of where it was a few years ago. The  benefits for Veeam have been worth it to me. 

Userlevel 7
Badge +20

hi guys, i have very bad experiece with ReFS on WS2016, one of my customer has around 60TB+ of backups on ReFS volume and after windows update and reboot. The repository was on the SAN iSCSI 10 GbE and the volume was “RAW”. from this time i not using ReFS on 2012R2,2016,2019. maybe on WS2022 the support is better. now I have to decide with the same customer whether to switch to ReFS or stay on NTFS. Synthetic full backup takes 17 hours, daily backup is 1 TB. The backup server is physical R720, with 10 GbE iSCSI to storage array and Windows Server 2019, im still little bit of scary :) hehe :)

 

tom.

Just for the space savings and the time for your synthetic it would be better to switch. Win2022 ReFS has been great for me and no further issues like previous releases.

Userlevel 4

hi guys, i have very bad experiece with ReFS on WS2016, one of my customer has around 60TB+ of backups on ReFS volume and after windows update and reboot. The repository was on the SAN iSCSI 10 GbE and the volume was “RAW”. from this time i not using ReFS on 2012R2,2016,2019. maybe on WS2022 the support is better. now I have to decide with the same customer whether to switch to ReFS or stay on NTFS. Synthetic full backup takes 17 hours, daily backup is 1 TB. The backup server is physical R720, with 10 GbE iSCSI to storage array and Windows Server 2019, im still little bit of scary :) hehe :)

 

tom.

Userlevel 7
Badge +20

We had disasters with ReFS, where entire volume became RAW. It was on Windows Server 2016 and 2019. However, ReFS-based storage wasn’t used as a Veeam repo. We haven’t faced such issues with Veeam repos, but we migrated to NTFS after disasters. I’ve heard that ReFS is more stable on 2022.

We now have XFS on our hardened repos and it works great.

Did you find the source of the volume becoming RAW?  In every case that I’ve seen that with one exception, it was due to an update that Microsoft released, often combined with VMware marking the disk as removable as I noted.  Beyond that, the other instances was the one where I purged the data out of the RAID controller cache causing corruption.  But I very reliably had issues with volumes changing to RAW with Windows Updates.

For us it was always a Windows Update that broke it.  We eventually excluded the servers with repos that were ReFS until we could migrate them to new ones.

For reference, here is the VMware article regarding NIC’s and SCSI controllers showing up as removable hardware.  On those instances where I had REFS, disabling HotAdd fixed the issue straight away regardless of patching.  Removing the patches was more of a workaround, and the KB below was….well...still a workaround, but a more stable way to do it.

https://kb.vmware.com/s/article/1012225

The best way to do it is a new Win2022 box with ReFS and migrate to it.  😋🤣

Userlevel 7
Badge +6

We had disasters with ReFS, where entire volume became RAW. It was on Windows Server 2016 and 2019. However, ReFS-based storage wasn’t used as a Veeam repo. We haven’t faced such issues with Veeam repos, but we migrated to NTFS after disasters. I’ve heard that ReFS is more stable on 2022.

We now have XFS on our hardened repos and it works great.

Did you find the source of the volume becoming RAW?  In every case that I’ve seen that with one exception, it was due to an update that Microsoft released, often combined with VMware marking the disk as removable as I noted.  Beyond that, the other instances was the one where I purged the data out of the RAID controller cache causing corruption.  But I very reliably had issues with volumes changing to RAW with Windows Updates.

For us it was always a Windows Update that broke it.  We eventually excluded the servers with repos that were ReFS until we could migrate them to new ones.

For reference, here is the VMware article regarding NIC’s and SCSI controllers showing up as removable hardware.  On those instances where I had REFS, disabling HotAdd fixed the issue straight away regardless of patching.  Removing the patches was more of a workaround, and the KB below was….well...still a workaround, but a more stable way to do it.

https://kb.vmware.com/s/article/1012225

Userlevel 7
Badge +20

We had disasters with ReFS, where entire volume became RAW. It was on Windows Server 2016 and 2019. However, ReFS-based storage wasn’t used as a Veeam repo. We haven’t faced such issues with Veeam repos, but we migrated to NTFS after disasters. I’ve heard that ReFS is more stable on 2022.

We now have XFS on our hardened repos and it works great.

Did you find the source of the volume becoming RAW?  In every case that I’ve seen that with one exception, it was due to an update that Microsoft released, often combined with VMware marking the disk as removable as I noted.  Beyond that, the other instances was the one where I purged the data out of the RAID controller cache causing corruption.  But I very reliably had issues with volumes changing to RAW with Windows Updates.

For us it was always a Windows Update that broke it.  We eventually excluded the servers with repos that were ReFS until we could migrate them to new ones.

Userlevel 7
Badge +6

We had disasters with ReFS, where entire volume became RAW. It was on Windows Server 2016 and 2019. However, ReFS-based storage wasn’t used as a Veeam repo. We haven’t faced such issues with Veeam repos, but we migrated to NTFS after disasters. I’ve heard that ReFS is more stable on 2022.

We now have XFS on our hardened repos and it works great.

Did you find the source of the volume becoming RAW?  In every case that I’ve seen that with one exception, it was due to an update that Microsoft released, often combined with VMware marking the disk as removable as I noted.  Beyond that, the other instances was the one where I purged the data out of the RAID controller cache causing corruption.  But I very reliably had issues with volumes changing to RAW with Windows Updates.

Userlevel 2

We had disasters with ReFS, where entire volume became RAW. It was on Windows Server 2016 and 2019. However, ReFS-based storage wasn’t used as a Veeam repo. We haven’t faced such issues with Veeam repos, but we migrated to NTFS after disasters. I’ve heard that ReFS is more stable on 2022.

We now have XFS on our hardened repos and it works great.

 

Userlevel 7
Badge +8

I have about 1PB of REFS volumes now and am happy. The first few years were not great, but no longer do we require modifying the registry and crossing our fingers that data is going to be safe.

 

There are reasons to choose NTFS, and others for REFS. For Veeam REFS is a no brainer. For my Windows File Servers NFTS is the way to go still.    

 

Once I replace these Repos, Linux hardened is another option I may choose. or a mix. 

Userlevel 7
Badge +6

I know time can be a prime resource, but I highly recommend making the time to tap into those courses, at least a little bit. With the linux experience you have thus far, some of it you can breeze through. Although, if you’re like me, regardless of the exposure you’ve had to this point, you’ll be taking notes as you go so could take longer than ‘normal’. It was so eye-opening to me, I couldn’t stop going (there were 7 coureses total by Andrew). It took me awhile, but glad I did them; I even did more I found. :)

 

You raise a good point.  The issue is that it’s a rabbit hole for me to go down...I end up killing day or two when I have other things that are a higher priority.  ADHD hyperfocus here plays a pretty pivotal role.  But!  There is a time where taking the time to out of the day to learn or script or whatever becomes a time-saver in the long run, so finding where that exists is important. 

It’s like scripting and automating - sure, it may take 4 hours for me to script something, and if that something takes me a hour or two each month, then it’s probably worth it in time saved, but for less visible things like the time it takes to circle back around and get refocused (my ADHD plays a role here too, but this is true for everyone).  But if it saves me 5 minutes once a month, is it worth it?  For repetitive tasks, absolutely, but for the occasional items?  Maybe not.

So is going down the rabbit hole of Linux worth it.  Sure is.  Same is to be said about the Azure training an certifications that are on my list of things to do as well!  Oh how I wish I had a time machine, or a cloning machine where I could create two or three of myself and then we can reconverge our gained knowledge back into the main person.  Wait….did I just refer to myself in both the first and third person at the same time?

Userlevel 7
Badge +17

I know time can be a prime resource, but I highly recommend making the time to tap into those courses, at least a little bit. With the linux experience you have thus far, some of it you can breeze through. Although, if you’re like me, regardless of the exposure you’ve had to this point, you’ll be taking notes as you go so could take longer than ‘normal’. It was so eye-opening to me, I couldn’t stop going (there were 7 coureses total by Andrew). It took me awhile, but glad I did them; I even did more I found. :)

Userlevel 7
Badge +6

Thanks @vmJoe @coolsport00.  I took some basic Linux courses about 20 years ago in college an have been using it on and off since then.  It was something I wanted to dig into for a long time, but as a Windows administrator, I never had much opportunity. It continues to become more and more prevalent so someday I’ll have to deep dive.  Until then, things like vCenter, ESXI and the LHR’s will be the places I continue to get the most exposure.  I’ll look into those courses as well - I don’t think I have a Pluralsight membership anymore, but I’ll have to check into it for sure.

Userlevel 7
Badge +17

@dloseke  What Joe said. Andrew Mallett or Nigel Poulton courses on Pluralsight FTW! 🙌🏼

Userlevel 7
Badge +7

Exactly this….I think REFS tends to be for those most comfortable with Windows and not comfortable with Linux.  However, the Linux/XFS side of things doesn’t appear to be that hard, and I have a new server that I’m going to play with before it goes into production as a Linux repo, plus I’m going to be trying out the new LHR ISO if (I can ever find the time) on an older server, so one way or another I’m going become more comfortable with it.  But for me, it’s been REFS just because of unfamiliarity.  And of course, moving to linux does rid me of having to have those pesky Windows licenses that I’m always seeming to run short on in the smaller, non-Datacenter licensed environment.
​​​​​

@dloseke - All great points! I think you’ll enjoy Linux after you use it!  Also, there are plenty of Linux learning resources available to help you gain some familiarity with the OS!

Userlevel 7
Badge +6

But I gotta ask - Why REFS? Most of the talk track now days is around immutable backups - So for Veeam that means Hardened Linux Repos with XFS for the Fast Clone/Block clone (like REFS).  There have been very little issues with XFS and it seems to be faster.

Exactly this….I think REFS tends to be for those most comfortable with Windows and not comfortable with Linux.  However, the Linux/XFS side of things doesn’t appear to be that hard, and I have a new server that I’m going to play with before it goes into production as a Linux repo, plus I’m going to be trying out the new LHR ISO if (I can ever find the time) on an older server, so one way or another I’m going become more comfortable with it.  But for me, it’s been REFS just because of unfamiliarity.  And of course, moving to linux does rid me of having to have those pesky Windows licenses that I’m always seeming to run short on in the smaller, non-Datacenter licensed environment.

Userlevel 7
Badge +17

Hey @vmJoe - honestly, it was the first thing that came to mind tbh. And the previous issues then crossed my mind as well. I do have a hardened repo already. Obviously I'm most comfortable with Windows, but more comfortable with Linux the past few months, so I'll probably indeed just go with Linux XFS. Thanks bud. 

Userlevel 7
Badge +7

@coolsport00 interesting! I was just discussing this the other day. I have not heard of anyone complaining about REFS issue in about a year.  It seems with all the recent OS releases and patching MSFT has done seems to have made REFS more stable.  If do decide to use it make sure you remember to use the 4K block size when you format the REFS volume.

But I gotta ask - Why REFS? Most of the talk track now days is around immutable backups - So for Veeam that means Hardened Linux Repos with XFS for the Fast Clone/Block clone (like REFS).  There have been very little issues with XFS and it seems to be faster.

Userlevel 7
Badge +6

@MicoolPaul hit the nail on the head here, but I’ll give my feelings as well. 

ReFS and I have a love-hate relationship.  I really liked it and then a peer noted that he had an issue in which we lost over 100TB of data and forced me really think critically about when and how I was using ReFS.  Gostev’s talks about ReFS and using storage that utilizes software RAID (such as Synology and QNAP NAS’s) made me think more critically as well.  My general rule is that ReFS is okay on “normal” sized volumes as long as you have hardware RAID with a batter backed cache.  I used to, and still have out in the wild ReFS running on iSCSI RDM’s backed by Synology or QNAP NAS’s, but I would not recommend doing this going forward and actively avoid that solution now - that is now relegated to NTFS when I have to use that hardware.  My preference is a server with local storage, or even DAS/SAN when economically possible.  However, there’s a lot of object-storage appliances on the market that are making me re-evaluate yet again.  If I have to use a NAS, it’s going to be relegated to NFS/SMB traffic which isn’t great, but going to be more reliable.  I liked the idea of ReFS on a RDM because synthetic operation will run better there than with NTFS, but I just don’t trust it very much used in this fashion.

Now, I’ll tell you that ReFS on a physical server with local storage is not foolproof.  I have a client who was having issues with his PERC on a server that had a 70ish TB RAID 10 volume.  In order to boot the machine at one point, I had to purge the data from the controller cache, and that corrupted the ReFS volume.  So I now have first-hand experience with this sort of issue that my peer had as well.  In this case, it was camera footage as this server was running a VMS system for several camera’s around a small town.  Wasn’t the end of the world to lose that data, but would have been nice to not have lost it as well.  When I formatted the volume to start over, I did use NTFS because it’s rock solid and we weren’t really taking advantage of ReFS, and in my case, it didn’t prove to be very resilient. 

As Michael noted, it’s as good as MS will let it be.  Older versions of ReFS were not as good as current versions by a long shot.  It should also be noted that if you have a ReFS volume on Server 2016 (and older versions I believe), and you migrate that volume to 2019 or 2022 server, there’s an upgrade that happens on mount - and depending on the backing storage and the size of that volume, that upgrade can take a long time, and doesn’t really give you a status as to what it’s doing.  And as I understand it, it is not backward compatible so you can’t move back to the old server if things don’t work out. 

And as Michael noted, I had several issues with Windows updates causing ReFS volumes to show up as RAW - some of that was resolved by adding a VMware advanced setting to the VM because in my case, Windows was seeing the RDM disk hosting the ReFS volume as removable storage, so the option had to be added to disable that from showing as removable.  But there were several updates that led up to that conclusion that could be uninstalled and the volume would show normally.

In the end, I’ll still use ReFS, but again, only on physical RAID controllers with a batter backed cache.  But my choice pick, and I say this as pretty heavy Windows user, and fairly light in the linux realm, will be XFS as that has proven time and time again to be solid in the industry.  Both have their place, but if you go into it knowing what the caveats are with ReFS, it can certainly be a useful tool given some of the advantages it has.  If it had native immutability, it would be even more competition to XFS, but for now, XFS wins out when applicable to the solution needed.

Userlevel 7
Badge +17

@MicoolPaul - “it’s as good as Microsoft lets it be” 😂 Man!...ain’t that the truth! Well said! hahaha Yeah, I actually just may go ahead and do the Linux XFS route. To heck with it. And, I honestly thought there probably wouldn’t be a *noticeable* difference in restore times vs NTFS. Slower, maybe, but probably nothing too dramatic. Even if it is slower, honestly it’s a bit slow now already. So a tad bit more slower isn’t too big a deal, so long as it’s not hours to restore a ‘normal’ VM or few. It’s rare that I do full VM restores anyway..mostly files here & there. Regardless, I may just go the XFS route...

Again, thanks for the added info!

Userlevel 7
Badge +17

@Chris.Childerhose - I have phys boxes running 2019 currently. Interestingly, I have to replace those boxes too because they’re old!..like, 8yrs or so old (Proliant DL380p 😳 😂 ). Thankfully though, they’ve been running stable since I implemented them 4-5yrs ago or so. So, I’ll probably for sure look to get my server replacements (refurb’d) on 2022 & use ReFS

@Geoff Burke - yeah, I hadn’t really seen any issues recently either. Gostev used to share often in his weekly emails issues experienced with ReFS. Issue reporting died down even before he took a break from writing those, which is encouraging. I actually may just go ahead and bite a real big bullet and just quit using Windows for my repos and instead use Linux and XFS/immutable. Now that I’m more comfortable with Linux, and have a Hardened Repo already set up, I mean, why not? Only additional config I’ll need to work through is getting iSCSI setup on them. YOLO though, right?? 😂

Thanks for the input gents!

Comment