ReFS vs NTFS (or other)


Userlevel 7
Badge +19

Hey all. I haven’t yet pulled the plug as far as deploying ReFS on any of my Repos. Main reason initially was due to lack of stability I saw quite often in its early days. Other reason is I didn’t want to go through the hassle of creating a new Repo, then spending days migrating data here/there/everywhere. My NTFS has worked and done pretty well. The only fast clone storage I have implemented currently is with my Hardened Repo. Have had no issues there, but then again..XFS is a pretty mature FS 😊 .

Anyway, I just want to ping the Community yet again to hear of any ‘gotchas’ which may be out there which may lead me to being *against* implementing ReFS in my Veeam environment. I’m on the verge of getting some shiney new Nimble arrays in for my backup storage, so when I get it connected, it would be a prime opportunity for me to implement ReFS if it is indeed now stable. I think my only concern maybe with doing restores, and rehydration of data? Do restores take a bit longer to do? Any info you all can provide would be greatly appreciated.

As always..thanks! 😊


29 comments

Userlevel 7
Badge +21

I have implemented ReFS on Win2022 as it now seems like it is in a stable state compared to previous versions although there have been updates to Win2019 as well to address the issues that had.  I have not seen restores take any longer from ReFS but that is all dependent on other factors and the size of data you are restoring.

Nice to see new Nimbles as I would love to play in that realm again as they were amazing.  Best of luck but if these are new and you are worried just deploy Win2022 with an ReFS drive and run it through its paces before moving to PROD.  QA testing. 😎

Userlevel 7
Badge +22

After the initial disasters (especially for those that formatted with the default 4K) I don’t think that I have heard of many issues. Back in my old stomping grounds I replaced NTFS pretty soon since there were operational pressures to get the fast clone and space savings asap. However, as soon as the XFS repos appeared I switched to only using them, be it immutable or just normal xfs. 

One of the huge issues that pretty much killed NTFS for us as service providers was the long synthetic operations. Fast clone was great for that.

As for moving, I think you would need either an active full or synthetic full to take place on the data in order to take advantage of the space savings going forward.

Userlevel 7
Badge +21

Hi @coolsport00,

 

I’ve ridden the ReFS wave since the beginning with Veeam and my views are, it’s as good as Microsoft lets it be 🤷‍♂️

If you do ReFS, then Windows Server 2019 needs to be your minimum as 2016 was a mess. ReFS versions are tied to the OS and not backwards compatible once mounted Read & Write. Microsoft touted some operations were up to 50x faster on Server 2022 vs 2019, but as you’re only leveraging ReFS features at specific points (fast clone and reading from referenced blocks on restores for example) you won’t be seeing a dramatic difference. There’s arguments of a few % performance or efficiency here and there between ReFS and XFS, but they’re both pretty damn good at their jobs.

 

I say it’s as good as Microsoft let it be because one Windows update made all ReFS volumes appear as RAW, uninstalling the update made this read normally again, but I’m sure at least one person would’ve had a panic attack or worse seeing that!

 

With regards to performance, I’ll be honest, I haven’t benchmarked restores, but they’ve felt just as fast to me, but it does change the IO pattern as IO becomes more “random” during a restore, depending on if your Nimble is all flash or Hybrid will impact whether that’ll hurt your recovery times.

 

A big consideration though is that ReFS is RAM hungry. IIRC the guidance is 0.5GB of RAM per 1TB of ReFS file system. This is better in 2019/2022 vs 2016 and the BP guidance does state no need to go above 256GB RAM but you’ll certainly need some RAM to feed these things.

 

If any of these things put you off there’s no reason why you couldn’t just have a non-immutable XFS repository if not 🙂 if you feel the architecture won’t support immutability in the best possible ways.

 

Hope that helped!

Userlevel 7
Badge +21

After the initial disasters (especially for those that formatted with the default 4K) I don’t think that I have heard of many issues. Back in my old stomping grounds I replaced NTFS pretty soon since there were operational pressures to get the fast clone and space savings asap. However, as soon as the XFS repos appeared I switched to only using them, be it immutable or just normal xfs. 

One of the huge issues that pretty much killed NTFS for us as service providers was the long synthetic operations. Fast clone was great for that.

As for moving, I think you would need either an active full or synthetic full to take place on the data in order to take advantage of the space savings going forward.

Yep Geoff is right that you’ll need a full to leverage space savings, thankfully VeeaMover dehydrates retrospectively if you move existing backup chains to a new repository

Userlevel 7
Badge +19

@Chris.Childerhose - I have phys boxes running 2019 currently. Interestingly, I have to replace those boxes too because they’re old!..like, 8yrs or so old (Proliant DL380p 😳 😂 ). Thankfully though, they’ve been running stable since I implemented them 4-5yrs ago or so. So, I’ll probably for sure look to get my server replacements (refurb’d) on 2022 & use ReFS

@Geoff Burke - yeah, I hadn’t really seen any issues recently either. Gostev used to share often in his weekly emails issues experienced with ReFS. Issue reporting died down even before he took a break from writing those, which is encouraging. I actually may just go ahead and bite a real big bullet and just quit using Windows for my repos and instead use Linux and XFS/immutable. Now that I’m more comfortable with Linux, and have a Hardened Repo already set up, I mean, why not? Only additional config I’ll need to work through is getting iSCSI setup on them. YOLO though, right?? 😂

Thanks for the input gents!

Userlevel 7
Badge +19

@MicoolPaul - “it’s as good as Microsoft lets it be” 😂 Man!...ain’t that the truth! Well said! hahaha Yeah, I actually just may go ahead and do the Linux XFS route. To heck with it. And, I honestly thought there probably wouldn’t be a *noticeable* difference in restore times vs NTFS. Slower, maybe, but probably nothing too dramatic. Even if it is slower, honestly it’s a bit slow now already. So a tad bit more slower isn’t too big a deal, so long as it’s not hours to restore a ‘normal’ VM or few. It’s rare that I do full VM restores anyway..mostly files here & there. Regardless, I may just go the XFS route...

Again, thanks for the added info!

Userlevel 7
Badge +6

@MicoolPaul hit the nail on the head here, but I’ll give my feelings as well. 

ReFS and I have a love-hate relationship.  I really liked it and then a peer noted that he had an issue in which we lost over 100TB of data and forced me really think critically about when and how I was using ReFS.  Gostev’s talks about ReFS and using storage that utilizes software RAID (such as Synology and QNAP NAS’s) made me think more critically as well.  My general rule is that ReFS is okay on “normal” sized volumes as long as you have hardware RAID with a batter backed cache.  I used to, and still have out in the wild ReFS running on iSCSI RDM’s backed by Synology or QNAP NAS’s, but I would not recommend doing this going forward and actively avoid that solution now - that is now relegated to NTFS when I have to use that hardware.  My preference is a server with local storage, or even DAS/SAN when economically possible.  However, there’s a lot of object-storage appliances on the market that are making me re-evaluate yet again.  If I have to use a NAS, it’s going to be relegated to NFS/SMB traffic which isn’t great, but going to be more reliable.  I liked the idea of ReFS on a RDM because synthetic operation will run better there than with NTFS, but I just don’t trust it very much used in this fashion.

Now, I’ll tell you that ReFS on a physical server with local storage is not foolproof.  I have a client who was having issues with his PERC on a server that had a 70ish TB RAID 10 volume.  In order to boot the machine at one point, I had to purge the data from the controller cache, and that corrupted the ReFS volume.  So I now have first-hand experience with this sort of issue that my peer had as well.  In this case, it was camera footage as this server was running a VMS system for several camera’s around a small town.  Wasn’t the end of the world to lose that data, but would have been nice to not have lost it as well.  When I formatted the volume to start over, I did use NTFS because it’s rock solid and we weren’t really taking advantage of ReFS, and in my case, it didn’t prove to be very resilient. 

As Michael noted, it’s as good as MS will let it be.  Older versions of ReFS were not as good as current versions by a long shot.  It should also be noted that if you have a ReFS volume on Server 2016 (and older versions I believe), and you migrate that volume to 2019 or 2022 server, there’s an upgrade that happens on mount - and depending on the backing storage and the size of that volume, that upgrade can take a long time, and doesn’t really give you a status as to what it’s doing.  And as I understand it, it is not backward compatible so you can’t move back to the old server if things don’t work out. 

And as Michael noted, I had several issues with Windows updates causing ReFS volumes to show up as RAW - some of that was resolved by adding a VMware advanced setting to the VM because in my case, Windows was seeing the RDM disk hosting the ReFS volume as removable storage, so the option had to be added to disable that from showing as removable.  But there were several updates that led up to that conclusion that could be uninstalled and the volume would show normally.

In the end, I’ll still use ReFS, but again, only on physical RAID controllers with a batter backed cache.  But my choice pick, and I say this as pretty heavy Windows user, and fairly light in the linux realm, will be XFS as that has proven time and time again to be solid in the industry.  Both have their place, but if you go into it knowing what the caveats are with ReFS, it can certainly be a useful tool given some of the advantages it has.  If it had native immutability, it would be even more competition to XFS, but for now, XFS wins out when applicable to the solution needed.

Userlevel 7
Badge +7

@coolsport00 interesting! I was just discussing this the other day. I have not heard of anyone complaining about REFS issue in about a year.  It seems with all the recent OS releases and patching MSFT has done seems to have made REFS more stable.  If do decide to use it make sure you remember to use the 4K block size when you format the REFS volume.

But I gotta ask - Why REFS? Most of the talk track now days is around immutable backups - So for Veeam that means Hardened Linux Repos with XFS for the Fast Clone/Block clone (like REFS).  There have been very little issues with XFS and it seems to be faster.

Userlevel 7
Badge +19

Hey @vmJoe - honestly, it was the first thing that came to mind tbh. And the previous issues then crossed my mind as well. I do have a hardened repo already. Obviously I'm most comfortable with Windows, but more comfortable with Linux the past few months, so I'll probably indeed just go with Linux XFS. Thanks bud. 

Userlevel 7
Badge +6

But I gotta ask - Why REFS? Most of the talk track now days is around immutable backups - So for Veeam that means Hardened Linux Repos with XFS for the Fast Clone/Block clone (like REFS).  There have been very little issues with XFS and it seems to be faster.

Exactly this….I think REFS tends to be for those most comfortable with Windows and not comfortable with Linux.  However, the Linux/XFS side of things doesn’t appear to be that hard, and I have a new server that I’m going to play with before it goes into production as a Linux repo, plus I’m going to be trying out the new LHR ISO if (I can ever find the time) on an older server, so one way or another I’m going become more comfortable with it.  But for me, it’s been REFS just because of unfamiliarity.  And of course, moving to linux does rid me of having to have those pesky Windows licenses that I’m always seeming to run short on in the smaller, non-Datacenter licensed environment.

Userlevel 7
Badge +7

Exactly this….I think REFS tends to be for those most comfortable with Windows and not comfortable with Linux.  However, the Linux/XFS side of things doesn’t appear to be that hard, and I have a new server that I’m going to play with before it goes into production as a Linux repo, plus I’m going to be trying out the new LHR ISO if (I can ever find the time) on an older server, so one way or another I’m going become more comfortable with it.  But for me, it’s been REFS just because of unfamiliarity.  And of course, moving to linux does rid me of having to have those pesky Windows licenses that I’m always seeming to run short on in the smaller, non-Datacenter licensed environment.
​​​​​

@dloseke - All great points! I think you’ll enjoy Linux after you use it!  Also, there are plenty of Linux learning resources available to help you gain some familiarity with the OS!

Userlevel 7
Badge +19

@dloseke  What Joe said. Andrew Mallett or Nigel Poulton courses on Pluralsight FTW! 🙌🏼

Userlevel 7
Badge +6

Thanks @vmJoe @coolsport00.  I took some basic Linux courses about 20 years ago in college an have been using it on and off since then.  It was something I wanted to dig into for a long time, but as a Windows administrator, I never had much opportunity. It continues to become more and more prevalent so someday I’ll have to deep dive.  Until then, things like vCenter, ESXI and the LHR’s will be the places I continue to get the most exposure.  I’ll look into those courses as well - I don’t think I have a Pluralsight membership anymore, but I’ll have to check into it for sure.

Userlevel 7
Badge +19

I know time can be a prime resource, but I highly recommend making the time to tap into those courses, at least a little bit. With the linux experience you have thus far, some of it you can breeze through. Although, if you’re like me, regardless of the exposure you’ve had to this point, you’ll be taking notes as you go so could take longer than ‘normal’. It was so eye-opening to me, I couldn’t stop going (there were 7 coureses total by Andrew). It took me awhile, but glad I did them; I even did more I found. :)

Userlevel 7
Badge +6

I know time can be a prime resource, but I highly recommend making the time to tap into those courses, at least a little bit. With the linux experience you have thus far, some of it you can breeze through. Although, if you’re like me, regardless of the exposure you’ve had to this point, you’ll be taking notes as you go so could take longer than ‘normal’. It was so eye-opening to me, I couldn’t stop going (there were 7 coureses total by Andrew). It took me awhile, but glad I did them; I even did more I found. :)

 

You raise a good point.  The issue is that it’s a rabbit hole for me to go down...I end up killing day or two when I have other things that are a higher priority.  ADHD hyperfocus here plays a pretty pivotal role.  But!  There is a time where taking the time to out of the day to learn or script or whatever becomes a time-saver in the long run, so finding where that exists is important. 

It’s like scripting and automating - sure, it may take 4 hours for me to script something, and if that something takes me a hour or two each month, then it’s probably worth it in time saved, but for less visible things like the time it takes to circle back around and get refocused (my ADHD plays a role here too, but this is true for everyone).  But if it saves me 5 minutes once a month, is it worth it?  For repetitive tasks, absolutely, but for the occasional items?  Maybe not.

So is going down the rabbit hole of Linux worth it.  Sure is.  Same is to be said about the Azure training an certifications that are on my list of things to do as well!  Oh how I wish I had a time machine, or a cloning machine where I could create two or three of myself and then we can reconverge our gained knowledge back into the main person.  Wait….did I just refer to myself in both the first and third person at the same time?

Userlevel 7
Badge +8

I have about 1PB of REFS volumes now and am happy. The first few years were not great, but no longer do we require modifying the registry and crossing our fingers that data is going to be safe.

 

There are reasons to choose NTFS, and others for REFS. For Veeam REFS is a no brainer. For my Windows File Servers NFTS is the way to go still.    

 

Once I replace these Repos, Linux hardened is another option I may choose. or a mix. 

Userlevel 2

We had disasters with ReFS, where entire volume became RAW. It was on Windows Server 2016 and 2019. However, ReFS-based storage wasn’t used as a Veeam repo. We haven’t faced such issues with Veeam repos, but we migrated to NTFS after disasters. I’ve heard that ReFS is more stable on 2022.

We now have XFS on our hardened repos and it works great.

 

Userlevel 7
Badge +6

We had disasters with ReFS, where entire volume became RAW. It was on Windows Server 2016 and 2019. However, ReFS-based storage wasn’t used as a Veeam repo. We haven’t faced such issues with Veeam repos, but we migrated to NTFS after disasters. I’ve heard that ReFS is more stable on 2022.

We now have XFS on our hardened repos and it works great.

Did you find the source of the volume becoming RAW?  In every case that I’ve seen that with one exception, it was due to an update that Microsoft released, often combined with VMware marking the disk as removable as I noted.  Beyond that, the other instances was the one where I purged the data out of the RAID controller cache causing corruption.  But I very reliably had issues with volumes changing to RAW with Windows Updates.

Userlevel 7
Badge +21

We had disasters with ReFS, where entire volume became RAW. It was on Windows Server 2016 and 2019. However, ReFS-based storage wasn’t used as a Veeam repo. We haven’t faced such issues with Veeam repos, but we migrated to NTFS after disasters. I’ve heard that ReFS is more stable on 2022.

We now have XFS on our hardened repos and it works great.

Did you find the source of the volume becoming RAW?  In every case that I’ve seen that with one exception, it was due to an update that Microsoft released, often combined with VMware marking the disk as removable as I noted.  Beyond that, the other instances was the one where I purged the data out of the RAID controller cache causing corruption.  But I very reliably had issues with volumes changing to RAW with Windows Updates.

For us it was always a Windows Update that broke it.  We eventually excluded the servers with repos that were ReFS until we could migrate them to new ones.

Userlevel 7
Badge +6

We had disasters with ReFS, where entire volume became RAW. It was on Windows Server 2016 and 2019. However, ReFS-based storage wasn’t used as a Veeam repo. We haven’t faced such issues with Veeam repos, but we migrated to NTFS after disasters. I’ve heard that ReFS is more stable on 2022.

We now have XFS on our hardened repos and it works great.

Did you find the source of the volume becoming RAW?  In every case that I’ve seen that with one exception, it was due to an update that Microsoft released, often combined with VMware marking the disk as removable as I noted.  Beyond that, the other instances was the one where I purged the data out of the RAID controller cache causing corruption.  But I very reliably had issues with volumes changing to RAW with Windows Updates.

For us it was always a Windows Update that broke it.  We eventually excluded the servers with repos that were ReFS until we could migrate them to new ones.

For reference, here is the VMware article regarding NIC’s and SCSI controllers showing up as removable hardware.  On those instances where I had REFS, disabling HotAdd fixed the issue straight away regardless of patching.  Removing the patches was more of a workaround, and the KB below was….well...still a workaround, but a more stable way to do it.

https://kb.vmware.com/s/article/1012225

Userlevel 7
Badge +21

We had disasters with ReFS, where entire volume became RAW. It was on Windows Server 2016 and 2019. However, ReFS-based storage wasn’t used as a Veeam repo. We haven’t faced such issues with Veeam repos, but we migrated to NTFS after disasters. I’ve heard that ReFS is more stable on 2022.

We now have XFS on our hardened repos and it works great.

Did you find the source of the volume becoming RAW?  In every case that I’ve seen that with one exception, it was due to an update that Microsoft released, often combined with VMware marking the disk as removable as I noted.  Beyond that, the other instances was the one where I purged the data out of the RAID controller cache causing corruption.  But I very reliably had issues with volumes changing to RAW with Windows Updates.

For us it was always a Windows Update that broke it.  We eventually excluded the servers with repos that were ReFS until we could migrate them to new ones.

For reference, here is the VMware article regarding NIC’s and SCSI controllers showing up as removable hardware.  On those instances where I had REFS, disabling HotAdd fixed the issue straight away regardless of patching.  Removing the patches was more of a workaround, and the KB below was….well...still a workaround, but a more stable way to do it.

https://kb.vmware.com/s/article/1012225

The best way to do it is a new Win2022 box with ReFS and migrate to it.  😋🤣

Userlevel 4

hi guys, i have very bad experiece with ReFS on WS2016, one of my customer has around 60TB+ of backups on ReFS volume and after windows update and reboot. The repository was on the SAN iSCSI 10 GbE and the volume was “RAW”. from this time i not using ReFS on 2012R2,2016,2019. maybe on WS2022 the support is better. now I have to decide with the same customer whether to switch to ReFS or stay on NTFS. Synthetic full backup takes 17 hours, daily backup is 1 TB. The backup server is physical R720, with 10 GbE iSCSI to storage array and Windows Server 2019, im still little bit of scary :) hehe :)

 

tom.

Userlevel 7
Badge +21

hi guys, i have very bad experiece with ReFS on WS2016, one of my customer has around 60TB+ of backups on ReFS volume and after windows update and reboot. The repository was on the SAN iSCSI 10 GbE and the volume was “RAW”. from this time i not using ReFS on 2012R2,2016,2019. maybe on WS2022 the support is better. now I have to decide with the same customer whether to switch to ReFS or stay on NTFS. Synthetic full backup takes 17 hours, daily backup is 1 TB. The backup server is physical R720, with 10 GbE iSCSI to storage array and Windows Server 2019, im still little bit of scary :) hehe :)

 

tom.

Just for the space savings and the time for your synthetic it would be better to switch. Win2022 ReFS has been great for me and no further issues like previous releases.

Userlevel 7
Badge +8

I really think anyone who has had ReFS issues should post when. ReFS is leaps ahead of where it was a few years ago. The  benefits for Veeam have been worth it to me. 

Userlevel 7
Badge +6

I think I've been lucky as been using ReFS for the last few years without any major dramas although we only have approx 500TB stored across our repo’s

Comment