Solved

Backup failed on AlwaysOn server


Userlevel 1

We are getting below error message when trying to run a manual backup via VEEAM Plugin.

[ERROR]  Failed to backup database ABC. Error: Stopping job session

 

Anyone has any idea about the cause and fix?

icon

Best answer by Chris.Childerhose 21 February 2024, 15:29

View original

10 comments

Userlevel 7
Badge +20

What version of Veeam are you using?  What version of SQL?

Check the logs here for more details as posting just the main error does not help - C:\ProgramData\Veeam\Backup\Jobname

Userlevel 7
Badge +17

Hi @Sukuna09 -

As Chris shares, we need a bit more info to be able to try and help. I believe the log location, on the computer you’re using the Plugin on is:
%PROGRAMDATA%\Veeam\Backup\MSSQLPluginLogs folder

And, if you’re running the Plugin from the VBR server, look in:
%PROGRAMDATA%\Veeam\Backup\Plugin folder

Hopefully there’ll also be a job name folder or job name log file with info on the job you attempted to run.

Another place to look is Windows Event logs on the computer you’re running the Plugin on. Maybe there’s info there too. Let us know.

Userlevel 1

Hello @Chris.Childerhose@coolsport00 

Thanks for looking into it.

I have multiple files in C:\ProgramData\Veeam\Backup\MSSQLPluginLogs\SYSAgentSRV folder like

Agentsource, agentsource1, agentsource2, MSSQLVBRCommunicationManager and MSSQLRecoveryManager

Below is the error i found in MSSQLVBRCommunicationManager file. Will this help?

 

Connection status: system:10060 ( A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ).
[20.02.2024 20:00:37.156] < 10620>          |           Trying to connect to the endpoint [10.180.207.21:2518]
[20.02.2024 20:00:58.177] < 10620>          |   Connection status: system:10060 ( A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ).
[20.02.2024 20:00:58.177] < 10620>          |           Trying to connect to the endpoint [10.183.19.235:2518]
[20.02.2024 20:01:19.198] < 10620>          |   Connection status: system:10060 ( A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ).
[20.02.2024 20:01:19.198] < 10620>          |           Trying to connect to the endpoint [10.183.17.25:2518]
[20.02.2024 20:01:19.198] < 10620> net      |           Sending reconnect options: [enabled: timeout 300000, interval 15000].
[20.02.2024 20:01:19.198] < 15500>          | Thread started.  Role: '(async) reconn. receiver {125c1c2d-232a-45e7-86b5-392843558104}', thread id: 15500, parent id: 10620.
[20.02.2024 20:01:19.198] < 14036>          | Thread started.  Role: '(async) reconn. sender {125c1c2d-232a-45e7-86b5-392843558104}', thread id: 14036, parent id: 10620.
[20.02.2024 20:01:19.198] < 10620> plugin_co|           Connection will be encrypted with TLS. Certificate ID: B9633C024EFBD8D7B8364D8ABA83B03B4D2A3904
[20.02.2024 ] < 15500> alg      | Receiver stage: running recv cycle.
[20.02.2024 20:01:198] < 14036> alg      | Sender stage: running send cycle. Resend list: [0].
[20.02.2024 20:01:198] < 10620> cli      |           Doing Client TLS handshake
[20.02.2024 ] < 19832>          | Thread started.  Role: 'Client receiver', thread id: 19832, parent id: 10620.
[20.02.2024 20] < 10620> plugin_co|     [f29d] >> : Invoke: Handshake
[20.02.2024 20:0] < 10620> plugin_co|     [f29d] >> : Invoke: Generic.SetNetworkReconnectOptions\n{   (EUInt64) AttemptIntervalMs = 15000\n  (EBoolean) Enabled = true\n  (EUInt64) OverallTimeoutMs = 300000\n }
[20.02.2024 20] < 12972> plugin_co|         Starting new client for agent [{17aba84c-ec73-444b-882b-529b534be14e}]
[20.02.2024 20:] < 12972> plugin_co|         Starting backup client for agent with UID [{17aba84c-ec73-444b-882b-529b534be14e}]. Client id: [16a0]. Reconnect options: [enabled: timeout 300000, interval 15000]
 

Userlevel 7
Badge +20

There is definitely a connection issue to the AlwaysOn server.  Not sure exactly but best suggestion at this point is to open a support ticket with Veeam to deep dive the logs.

Userlevel 1

@Chris.Childerhose Thank you Chris, I have already logged a ticket with VEEAM Support team for root cause analysis. Hope to get an answer from them.

Userlevel 7
Badge +20

@Chris.ChilderhoseThank you Chris, I have already logged a ticket with VEEAM Support team for root cause analysis. Hope to get an answer from them.

Not a problem at all.  I like looking at logs but many times it is not obvious so support is the right track.

Let us know what they say and the resolution.

Userlevel 7
Badge +17

There is definitely a connection issue to the AlwaysOn server.  Not sure exactly but best suggestion at this point is to open a support ticket with Veeam to deep dive the logs.

@Sukuna09 - agree with Chris here...you’re just not making a connection from the get-go.

Keep us posted what Support finds out.

Userlevel 1

Got the reply from VEEAM support team. As per them.

It looks like the AV software is interfering with the Veeam plugin backup job.

Action plan.

Please whitelist the Veeam MS SQL plugin on the AV on all the affected nodes and afterwards let me know how the backup job goes.

 

Will update you on this once tested.

Userlevel 7
Badge +17

Ah, ok. The old A/V exception list deal. Glad they seem to have found the issue. Keep us posted after you made the needed changes how it all goes.

Userlevel 7
Badge +20

Sounds good. Glad support was able to point you in the right direction.  Keep us posted.

Comment