Solved

Error: Unstable connection: unable to transmit data.


Userlevel 2
Badge
  • Not a newbie anymore
  • 6 comments

Hello there,

I have a backup job which was going seamlessly. but in the last days it was failing after reaching 80% with error message attached below. I tried the job by disabling firewall rules nothing new occurred. All backup infrastructure components are working fine for other VMs. I have also checked firewall logs but there is nothing blocked related to this backup.
Error message:

Error: Unstable connection: unable to transmit data. Failed to upload disk. Skipped arguments: [vddkConnSpec>]; Agent failed to process method {DataTransfer.SyncDisk}. Exception from server: An existing connection was forcibly closed by the remote host Unable to retrieve next block transmission command. Number of already processed blocks: [11956]. Failed to download disk 'VM-Name.vmdk'.

 

I need to figure out how to resolve this issue immediately.

icon

Best answer by MicoolPaul 27 July 2021, 16:10

View original

14 comments

Userlevel 7
Badge +20

Hi,

 

As per your “I need to figure out how to resolve this issue immediately.”, this is a community forum so you’re best engaging with Veeam Support directly for your issue.

Ask yourself the following questions:

  1. Have you changed anything in your network topology?
  2. What’s the uptime on your servers? Has anything been patched recently?
  3. Depending on the answer to question 2, I’ve seen issues whereby servers have come online but haven’t recognised they’re on a domain network and their Windows firewall profile changes. (You haven’t given much information here so I’m assuming Windows proxy hosts, domain joined, windows firewall enabled. More information here greatly appreciated).
  4. Based on your mention of firewalls, is this a site-to-site or to cloud backup or to local storage? In summary, is there an MPLS/WAN involved?
  5. Can you provide information on where your proxy and repository roles are and whether the backup job has any specified proxies or just set to use automatic. (My thought process here is that you could have two sites, a proxy at each and then the job set to automatically select proxies, then you could be trying to reach your source environment from across a site-to-site VPN or MPLS).

 

Let us know please.

Userlevel 7
Badge +7

@Abela  can you post log error please?

Backup VM Windows?

can you check C:\Windows\TEMP\vmware-SYSTEM it must be empty

 

here discussion for the same problem:

Error: Unstable connection: unable to transmit data. Failed to upload disk (veeam.com)

Userlevel 2
Badge

@MicoolPaul Our firewalls are local firewalls(LAN). And all backup Infra. components are on site and also they are installed on separate windows VMs. The job has been set to choose proxy automatically. But during diagnosing the problem I tried by choosing specific proxy server which is nearby to the datastore  in addition I have tried to change the repository to another machine with different VLAN but the result was the same still it failing when it reaches around 80%. All the VMs have windows OS.

Userlevel 7
Badge +20

Is it getting to a specific duration before failing? It could be a firewall closing the session. Key advice from the link that @Link State shared is using wireshark to capture what’s happening.

If your firewall supports it you could disable stateful inspection (basically making the traffic routed via the firewalls but not inspected) between the two endpoints and test your success that way, would help rule out any firewall issues.

 

 

Userlevel 7
Badge +20

Another place to check @Abela is the log directory for your Veeam Jobs.  They are located here - C:\ProgramData\Veeam\Backup and there is a folder for each job that may help you troubleshoot this issue.

Also, I am attaching a log diving presentation from a few years back (2017) that may help once you get into the log directory above.  Really great resource for sure as I have used it a lot.

Userlevel 7
Badge +7

@Abela can you post part of log job inC:\ProgramData\Veeam\Backup\YOUR-JOB-NAME

 

Is the backup that fails incremental or full?

have you tried to reschedule a new job with the same VM that fails?

Do you have a VMware VM guest network card type of E1000 for proxy repo (change it)?

 

it happened to me that the traffic passed through a physical FW the data flow was massive and the FW dropped the connection.

 

 

Userlevel 2
Badge

@MicoolPaul yes of course after reaching 80% it will freeze for the some time and it will fail. I will see what I can do on your suggestion regarding disabling the inspection on the firewall. In the meantime I was capturing network packets using Wireshark and I have noticed there is a TCP-KEEP-ALIVE failure. Does it imply something???? 

Userlevel 7
Badge +20

Hi @Abela,

 

It does indeed, TCP is a stateful connection between two endpoints. To ensure that if a connection has failed or otherwise not been terminated properly, the endpoints need to know when to consider a connection dropped. TCP keepalive probe is a way of handling this, if the proxy is busy gathering data and hasn’t sent a packet to the repository’s data mover for a while, it can send a keep alive (this is nothing magic, just part of the TCP specification) to keep the connection alive.


When you have any device performing NAT or a firewall in the middle, this gets more interesting. These gateway devices have a finite amount of network ports available but can carry huge amounts of traffic, so it is more important for these devices to ensure stale connections are closed as soon as possible, if they don’t, they may not have capacity for new traffic. This can cause issues because you might have a TCP keep-alive timer within your application of 300 seconds for example, but the firewall has a keep-alive timer of only 30 seconds. So if the firewall doesn’t see any traffic for 30 seconds, it will consider the connection dead and close the connection.

Realistically most firewalls won’t do this for TCP traffic as it is stateful, whereas for UDP it’s common to see a 30-300 second timer due to them having no stateful control.​​ The firewalls I tend to work with have keep-alives of 3-8 hours. As this issue has just started, it does make sense to look at any potential changes.

 

A second reason this may occur is if your network topology has changed and you end up with something called asymmetric routing, whereby the traffic is sent via one gateway, but received by another, as the gateways don’t see traffic flowing in both directions they can determine the traffic is invalid and close the session. Performing traceroutes on both endpoints to the other is a great way to detect this.

 

I’m also making the assumption here that you’re not using NAT between these endpoints (if one side is 10.0.0.1 and the other side is 10.0.1.1, they see these IP addresses of each other and not being changed to a router/firewall’s IP address)

 


We’re going down a very specific rabbit hole here and it may be the complete wrong thread to pull at, so I’d suggest following the advice of the others as well and supply some logs, do you have Veeam Support? If so they’ll be able to spend some time with you going through such common issues.

 

Good Luck! :relaxed:

Userlevel 2
Badge
[27.07.2021 15:16:18] <15> Warning      All agents reconnection processes are disabled. Session stop event is signaled.
[27.07.2021 15:17:11] <27> Info Task session 'bcfee333-77b3-479b-8391-8e7f39555db3' has been completed, status: 'Failed', '55,172,923,392' of '55,172,923,392' bytes, '7' of '7' objects, details: ''
[27.07.2021 15:17:11] <22> Info Task session '6ac96c36-832f-4242-9f9e-40da1157aaba' has been completed, status: 'Failed', '56,037,998,592' of '56,037,998,592' bytes, '8' of '8' objects, details: ''
[27.07.2021 15:17:14] <01> Info [Session] Id 'c0f33055-9859-45c6-8d90-07347b90f6a2', State 'Postprocessing'.
[27.07.2021 15:17:14] <01> Info Retrieving space info for repository : Id [ec1be8ef-e813-451e-8c56-dc5e4e29e921]
[27.07.2021 15:17:14] <01> Info Fixing credentials to down-level format
[27.07.2021 15:17:14] <01> Error Could not resolve IP [IP Address of the repository]
[27.07.2021 15:17:14] <01> Error No such host is known (System.Net.Sockets.SocketException)
[27.07.2021 15:17:14] <01> Error at System.Net.Dns.InternalGetHostByAddress(IPAddress address, Boolean includeIPv6)
[27.07.2021 15:17:14] <01> Error at System.Net.Dns.GetHostEntry(IPAddress address)
[27.07.2021 15:17:14] <01> Error at Veeam.Backup.Model.SNetworkAddressResolver.ResolveDnsNameSafe(IPAddress hostIpAddress)
[27.07.2021 15:17:14] <01> Info Resolved by NTLM strategy ip addresses and host names: 10.1.9.103
[27.07.2021 15:17:14] <01> Info [CProxyRpcInvoker] RpcInvoker [28349038] has been created. Host: [IP Address of the repository:6160]

@Link State initially it was an incremental backup job then I have tried the full backup also but both failed. The failing job has 2 VMs. I tried taking the backup after separating the VMs into new jobs but nothing new happened. We have E1000E network cards on the proxy servers. but the repository is a separate physical server with intel adapter.

Userlevel 7
Badge +20
[27.07.2021 15:16:18] <15> Warning      All agents reconnection processes are disabled. Session stop event is signaled.
[27.07.2021 15:17:11] <27> Info Task session 'bcfee333-77b3-479b-8391-8e7f39555db3' has been completed, status: 'Failed', '55,172,923,392' of '55,172,923,392' bytes, '7' of '7' objects, details: ''
[27.07.2021 15:17:11] <22> Info Task session '6ac96c36-832f-4242-9f9e-40da1157aaba' has been completed, status: 'Failed', '56,037,998,592' of '56,037,998,592' bytes, '8' of '8' objects, details: ''
[27.07.2021 15:17:14] <01> Info [Session] Id 'c0f33055-9859-45c6-8d90-07347b90f6a2', State 'Postprocessing'.
[27.07.2021 15:17:14] <01> Info Retrieving space info for repository : Id [ec1be8ef-e813-451e-8c56-dc5e4e29e921]
[27.07.2021 15:17:14] <01> Info Fixing credentials to down-level format
[27.07.2021 15:17:14] <01> Error Could not resolve IP [IP Address of the repository]
[27.07.2021 15:17:14] <01> Error No such host is known (System.Net.Sockets.SocketException)
[27.07.2021 15:17:14] <01> Error at System.Net.Dns.InternalGetHostByAddress(IPAddress address, Boolean includeIPv6)
[27.07.2021 15:17:14] <01> Error at System.Net.Dns.GetHostEntry(IPAddress address)
[27.07.2021 15:17:14] <01> Error at Veeam.Backup.Model.SNetworkAddressResolver.ResolveDnsNameSafe(IPAddress hostIpAddress)
[27.07.2021 15:17:14] <01> Info Resolved by NTLM strategy ip addresses and host names: 10.1.9.103
[27.07.2021 15:17:14] <01> Info [CProxyRpcInvoker] RpcInvoker [28349038] has been created. Host: [IP Address of the repository:6160]

@Link State initially it was an incremental backup job then I have tried the full backup also but both failed. The failing job has 2 VMs. I tried taking the backup after separating the VMs into new jobs but nothing new happened. We have E1000E network cards on the proxy servers. but the repository is a separate physical server with intel adapter.

One thing with E1000E if you are on Windows 2012 it is a problem with network drops.  Also this NIC maxes at 1GB throughput where the recommendation for the VMXNET3 has 10GB throughput.  I suggest as noted changing the Proxy server NICs and test or even deploy a new one with VMXNET3 to test with.

Userlevel 7
Badge +7

from the log the server repo is not resolved as expected. Check dnslookup and relative reverse is not resolved, to test insert ip fqdn in the hosts and retry backup

[27.07.2021 15:17:14] <01> Error        Could not resolve IP [IP Address of the repository]
[27.07.2021 15:17:14] <01> Error        No such host is known (System.Net.Sockets.SocketException)
[27.07.2021 15:17:14] <01> Error           at System.Net.Dns.InternalGetHostByAddress(IPAddress address, Boolean includeIPv6)
[27.07.2021 15:17:14] <01> Error           at System.Net.Dns.GetHostEntry(IPAddress address)
[27.07.2021 15:17:14] <01> Error           at Veeam.Backup.Model.SNetworkAddressResolver.ResolveDnsNameSafe(IPAddress hostIpAddress)

 

a tip change the E1000 with vmxnet3

 

VMXNET3 vs E1000E and E1000 – part 2 – RICKARD NOBEL AB

Userlevel 7
Badge +7

 

it happened to me that the traffic passed through a physical FW the data flow was massive and the FW dropped the connection.

 

 

 

Userlevel 2
Badge

I want thank you all of you for your support specially @MicoolPaul . The issue is resolved. we have found a limit in our firewall IDS rule. After making an extension it worked harmoniously.

Userlevel 7
Badge +20

I want thank you all of you for your support specially @MicoolPaul . The issue is resolved. we have found a limit in our firewall IDS rule. After making an extension it worked harmoniously.

Thank you for the kind words and for confirming what your fault was, helps the next person looking for the same issue! :grinning:

Comment