Solved

Veeam Backup for Microsoft 365 proxy disconnects and proxy Windows service dies on every backup. Anyone else seen this?


Userlevel 3

 

I had M365 backups working flawlessly and then needed to switch repositories. Somewhere in there, something broke. So I ended up removing the repo, removing Veeam M365 backup, cleaning up all app residue and re-installing from scratch, since this was a new, small instance and I just wanted it to be clean.

Now, every time I try to run a backup against the same recreated Org, the job will connect to the org, start resolving the objects, it finds the objects to backup but never actually receives any data. Shortly thereafter (30 seconds or so), the proxy service crashes and the portal shows status “disconnected” and Details “Job was interrupted”.

I have even tried putting it on a different VM and the same behavior persists. The only element that I haven’t recreated is the O365 implementation. I have also recreated the Azure apps/certs each time as well.

The logs don’t seem to show much. Below is the end of the latest log as well as the console view and some event logs. I am kind of at a loss here as to where the problem lies...

 

icon

Best answer by Mildur 29 June 2023, 01:52

View original

29 comments

Userlevel 7
Badge +20

Unfortunately, despite that wealth of information you’ve placed into your post, it’s all generic.

 

Here’s what I can determine:

As the logs have nothing, this is an unhandled exception/error. My thoughts that come to mind with this are:

  • An application dependency hasn’t properly reinstalled
  • Anti-virus is killing the application
  • Some level of file system corruption.

I’d double check your AV exclusions are in place, scan your disk for errors, then check the VB365 dependencies and reinstall them. Failing this and if logs don’t reveal anything else, tracing tools might be required/a support case with Veeam might be necessary

Userlevel 7
Badge +20

Btw as you used different VMs, were these from the same template or did you do a fresh manual build of the OS each time?

Userlevel 3

Btw as you used different VMs, were these from the same template or did you do a fresh manual build of the OS each time?

Fresh build. Actually the first machine is hardware. Proliand DL360 with all the CPU and RAM.

Second attempt was a fresh VM that meets sys requirements.

Userlevel 3

Unfortunately, despite that wealth of information you’ve placed into your post, it’s all generic.

 

Here’s what I can determine:

As the logs have nothing, this is an unhandled exception/error. My thoughts that come to mind with this are:

  • An application dependency hasn’t properly reinstalled
  • Anti-virus is killing the application
  • Some level of file system corruption.

I’d double check your AV exclusions are in place, scan your disk for errors, then check the VB365 dependencies and reinstall them. Failing this and if logs don’t reveal anything else, tracing tools might be required/a support case with Veeam might be necessary

The only AV is Windows Defender and no exclusions were necessary the first time around. But it wouldn’t hurt to try, I suppose.

I am using community and don’t have support. So, if I can’t get it working, I will just have to not use it I guess. Based on Googling around, I am the only person who has ever had this problem, or one close to it. LOL

Userlevel 3

It’s odd… I tried replacing the service account default of Local System with a local admin level domain account and the job will run and not kill the proxy. But it connects to the org and finds nothing to backup and then completes with that warning.

Tried rebuilding the org with the service set to the domain account and same outcome.

I could see something going wrong on the original system but what is super strange is that the problem persisted on a completely different system.

This made me wonder if there’s not something on the Azure side contributing but, outside of the Azure app, which was newly created each time and the old one removed, I can’t see any indication of a problem… hmmm

Userlevel 7
Badge +20

Another question - are you using v6 or v7 and both with the latest patches?  Just wondering if this is on v6 and if you tried updating to v7 if the same thing happens.  If v7 already then ignore this suggestion as noted by Michael further testing or support case might be needed.

You could also try the forums, but you typically need a support case - http://forums.veeam.com

Best of luck.

Userlevel 3

Another question - are you using v6 or v7 and both with the latest patches?  Just wondering if this is on v6 and if you tried updating to v7 if the same thing happens.  If v7 already then ignore this suggestion as noted by Michael further testing or support case might be needed.

You could also try the forums, but you typically need a support case - http://forums.veeam.com

Best of luck.

Yah don’t have a support contract so no case, I am afraid. Otherwise that would have been my first stop.

 

Userlevel 7
Badge +20

That is very bizarre, what OS and version are you using? I’d be interested to try a different install media, potentially of another version, 2019 or 2022 instead of the other to see what happens.

Userlevel 7
Badge +6

AV was my first thought as well, or some sort of EDR, etc.  Is this machine domain joined?  Any sort of AD security that could be enforced on it?

Userlevel 3

That is very bizarre, what OS and version are you using? I’d be interested to try a different install media, potentially of another version, 2019 or 2022 instead of the other to see what happens.

Could def try different media. Version is 7.0.0.3604 and running on fully patched Server 2022.

Userlevel 3

AV was my first thought as well, or some sort of EDR, etc.  Is this machine domain joined?  Any sort of AD security that could be enforced on it?

Only Windows Defender running. It is domain joined, always has been, not a DC, and no fancy AD security. Basically common defaults.

Userlevel 7
Badge +6

AV was my first thought as well, or some sort of EDR, etc.  Is this machine domain joined?  Any sort of AD security that could be enforced on it?

Only Windows Defender running. It is domain joined, always has been, not a DC, and no fancy AD security. Basically common defaults.

Server is in the same OU (and then getting the same GPO’s) as the old server?

Userlevel 3

AV was my first thought as well, or some sort of EDR, etc.  Is this machine domain joined?  Any sort of AD security that could be enforced on it?

Only Windows Defender running. It is domain joined, always has been, not a DC, and no fancy AD security. Basically common defaults.

Server is in the same OU (and then getting the same GPO’s) as the old server?

Correct.

Userlevel 7
Badge +20

What about the Windows Firewall is that on or off?  If on are the required ports allowed for Veeam O365?  Just trying to think of anything else to check other than going to support at this point. 😂

Userlevel 3

What about the Windows Firewall is that on or off?  If on are the required ports allowed for Veeam O365?  Just trying to think of anything else to check other than going to support at this point. 😂

Checked firewall at one point and confirmed ports open.

Just now tried disabling all profiles and running it. No change.

Wondering at this point if there’s not some .Net corruption or something. Working that angle now...

Userlevel 7
Badge +20

What about the Windows Firewall is that on or off?  If on are the required ports allowed for Veeam O365?  Just trying to think of anything else to check other than going to support at this point. 😂

Checked firewall at one point and confirmed ports open.

Just now tried disabling all profiles and running it. No change.

Wondering at this point if there’s not some .Net corruption or something. Working that angle now...

That is possible maybe but worth a shot at this point.

Userlevel 7
Badge +6

Is the server using the same IP address as the old server?  Any firewall changes?  IPS/IDS?  SSL traffic monitoring to where a cert could be changing between M365 and the firewall, and the firewall and the server?  It kind of has the feel of the firewall interrupting data flow and VB365 not handing that disruption of the traffic or something like that.  Not sure why it would crash, but wondering if this could be a network issue instead.

Userlevel 7
Badge +20

What about the Windows Firewall is that on or off?  If on are the required ports allowed for Veeam O365?  Just trying to think of anything else to check other than going to support at this point. 😂

Checked firewall at one point and confirmed ports open.

Just now tried disabling all profiles and running it. No change.

Wondering at this point if there’s not some .Net corruption or something. Working that angle now...

That was my thoughts RE prerequisites, in case something wasn’t included in your media. If you can try a 2019 ISO and see if that works that’d be a good benchmark please

Userlevel 3

What about the Windows Firewall is that on or off?  If on are the required ports allowed for Veeam O365?  Just trying to think of anything else to check other than going to support at this point. 😂

Checked firewall at one point and confirmed ports open.

Just now tried disabling all profiles and running it. No change.

Wondering at this point if there’s not some .Net corruption or something. Working that angle now...

That was my thoughts RE prerequisites, in case something wasn’t included in your media. If you can try a 2019 ISO and see if that works that’d be a good benchmark please

No .NET corruption found. Installed .NET 7 anyway (6 was in place), no change.

Ran sfc /scannow, no corruption there.

Next up is another Veeam M365 intall from a fresh ISO

Userlevel 3

Is the server using the same IP address as the old server?  Any firewall changes?  IPS/IDS?  SSL traffic monitoring to where a cert could be changing between M365 and the firewall, and the firewall and the server?  It kind of has the feel of the firewall interrupting data flow and VB365 not handing that disruption of the traffic or something like that.  Not sure why it would crash, but wondering if this could be a network issue instead.

Same server, same IP, no firewall changes. No IPS/IDS in play. Nothing fancy on the network and no SPI or anything like that.

Tried with the firewall completely off and no change.

Checked my piHole for any DNS blocking that might have come about since the initial working install, nothing blocking...

The server it’s on isn’t even old or kludge. It’s a very well managed, fresh, Server 2022 instance with nothing fancy other than HyperV running. Vastly big enough dual xeon silver with 128GB RAM. Repo is 5TB with about 50% free and only expected to backup two M365 users with sub 100GB OnDrives.

It’s a bonafide mystery so far. I have never had an application this simple have this much trouble. And Veeam is usually bullet proof for me.

All I did to break it in the first place is try to relocate the repo files. Killed the repo in Veeam, relocated the files to a new folder on the same NAS, built a new repo on the new share. Didn’t like that… Killed the repo again, wiped out the files, rebuilt the repo on that same share, and here we are.

Userlevel 3

Even a reboot didn’t fix it!

 

LOL, just kidding. Only posting this in case someone reads through all of these and wonders “did he even reboot the damn thing?”

 

Yes, first rule of Windows Club. Reboot.

Userlevel 7
Badge +12

Killed the repo again, wiped out the files, rebuilt the repo on that same share, and here we are.

 

 

Running VB365 repositories from a SMB share is not recommended. We only provide experimental support and it comes with some limitations.


Please use iSCSI instead of an SMB share. I assume your proxy service is failing because of the instability between your windows server and database files on the SMB share. A jet based database (vb365 repo files) should never be run from a network share.

https://www.veeam.com/kb2971

 

best,

Fabian

Userlevel 3

Killed the repo again, wiped out the files, rebuilt the repo on that same share, and here we are.

 

 

Running VB365 repositories from a SMB share is not recommended. We only provide experimental support and it comes with some limitations.


Please use iSCSI instead of an SMB share. I assume your proxy service is failing because of the instability between your windows server and database files on the SMB share. A jet based database (vb365 repo files) should never be run from a network share.

https://www.veeam.com/kb2971

 

best,

Fabian

Interesting. It’s working splendidly for normal Veeam Backup from the same system.

For the record, connection to the SMB share is over an end to end 10Gbps SFP link.

Let me try the iSCSI route and see what happens. I assumed since there were no warnings and it worked that it was supported. Bad assumption, it seems.

Userlevel 7
Badge +12

Normal Veeam Backup does not use databases to store the backup data. But even with Veeam Backup & Replication I wouldn’t use it as a backup target :)

I prefer Object Storage or a hardened repository (Linux with XFS filesystem).

 

Try to put the entire folder structure on an iSCSI volume and see if it works better.

 

Best,

Fabian

Userlevel 3

Killed the repo again, wiped out the files, rebuilt the repo on that same share, and here we are.

 

 

Running VB365 repositories from a SMB share is not recommended. We only provide experimental support and it comes with some limitations.


Please use iSCSI instead of an SMB share. I assume your proxy service is failing because of the instability between your windows server and database files on the SMB share. A jet based database (vb365 repo files) should never be run from a network share.

https://www.veeam.com/kb2971

 

best,

Fabian

You, sir, have touched on a possible solution. Or at least a clearer indication of where the problem lies.

I moved the job to a local repo as a test, and it’s now running successfully.

Thank you for pointing this out. In all my digging, I never saw mention of SMB shares being unsupported or experimental.

Now I need to re-factor my whole setup for iSCSI.

May others find this and be saved from the churn...

Comment