Skip to main content

An X8M-2 Exadata system hosting an Oracle database environment has an issue with some of its database backups.

During the backup of very large databases and after many hours of it running, the RMAN plugin detects a socket timeout on the connection between the client and Veeam's Backup and Replication server (VBR), on its management port 10006, and throws the following exception


RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch1 channel at 09/06/2024 14:25:53
ORA-19506: failed to create sequential file, name="28045ae9-146a-42fd-837a-d32b1ac4a4bc/RMAN_2044054831_TPIIP_20240906_pv34bjb7_1_1.vab", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
   Failed to connect to the endpoint t172.26.173.14:10006]. Connection timed out
--tr:Failed to connect to target endpoint.
--tr:Failed to establish reconnectable connection to VBR

When this happens, the Veeam plugin program continues to remain active and running backups, however all concurrent and impending backups fail as they’re all using the same active process and its threads, all of which are now in this irrevocably broken state. The only solution is to forcefully kill all active Veeam and its related processes on the client server.

Our analysis has revealed the following things.

Firstly, only very large terabyte size database backups are affected, smaller databases are fine. This probably means the issue is a function of execution time rather than database size. After a long period of time, for some reason the connection fails which breaks the entire plugin network-related stack right to the parent process level and does not recover.

Secondly, we pinpointed the moment this error started happening, which was our upgrade from Veeam version 11 to 12.1 last year. All version 11 backups prior to the upgrade were successful.

Hi ​@geofreyr,

I suggest that you open a Veeam support ticket to the very specific error message you receive. Since we are “just” a community forum we can’t provide any kind of support or details on a very deep level so contacting support would be the easiest and quickest way for you.

 

Anyways please feel free to keep us posted when you got a solution!

 

Best

Lukas


thanks ​@lukas.k, we have actually had an active case open for quite some time. I’ll keep you posted about that.


Hi everyone,

Turned out to be a bug in the v12 RMAN plugin version, one of the VBR connection threads was not reconnecting after a timeout, which it did in v11. Veeam found and released a fix which resolved the issue for us.


Comment