Solved

Random failures on new setup


Userlevel 2

So we are deploying Veeam for ourselves and I have been fighting with it for a few days.  Our setup is:

 

Veeam server - Non-Domain joined

Veeam Proxy - Linux with XFS volume attached via RDM in VMware

A scale out Repo setup that is backed by the xfs volume from the Linux machine

Our primary Altera arrays are also added into Veeam under the storage Infrastructure and i am able to browse the snapshots on them and perform restores.

 

Both servers are located in the same site/network.  The storage is coming from a LUN on an Altera Hybrid array that is presented to both hosts in the esxi cluster.

We have a mix of domain joined, non-domain joined windows machines along with some random Linux boxes.  I just have a simple nightly backup job created that has all of our servers included in it.  I have selected the Linux proxy for the backup proxy in the job which then pulls in the Scale-out Backup Repo.  

In the advanced settings everything is default with the exception of changing the compression level to Dedupe-friendly and the Integration is set to Enable backup form storage snapshots with failover to standard backup.  I also have application aware processing unticked as well as guest file system indexing.

 

So now onto the issue..

When the job kicks off i have a ton of failures on random machines.  At first i thought it was domain-joined machines but those are failing as well.  The errors that i get are:

Processing “MachineName” Error: File Does not exist.  [Machine_Name.vm-73D2024-02-18T220019_D888.vib]. Failed to open storage for read access. Storage: [Machine_Name.vm-73D2024-02-18T220019_D888.vib]. Failed to download disk 'Machine_Name-flat.vmdk'. Reconnectable protocol device was closed. Failed to upload disk '>' Agent failed to process method {DataTransfer.SyncDisk}.      00:53

Error: File does not exist. File: [Machine_Name.vm-73D2024-02-18T220019_D888.vib]. Failed to open storage for read access. Storage: [Machine_Name.vm-73D2024-02-18T220019_D888.vib]. Failed to download disk 'Machine_Name-flat.vmdk'. Reconnectable protocol device was closed. Failed to upload disk '>' Agent failed to process method {DataTransfer.SyncDisk}.      
 

The other thing that i am seeing is that Veeam is using snapshots in esxi vs the storage array snapshots.

 

I have tried tweaking a bunch of things, multiple jobs with single server etc and nothing seems to work.  Does anyone have any ideas?

icon

Best answer by Chris.Childerhose 6 March 2024, 02:33

View original

13 comments

Userlevel 7
Badge +20

There is definitely something within the configuration causing this just hard to point where.  What versions of OS and Veeam are you using?

Userlevel 2

Veeam server is 2022 standard.  Veeam is 12.1.1.56 Enterprise Plus.  vmware is 7.0u3.  Linux proxy box is latest ubuntu build.  We were running the proxy via a windows box with a REFS formatted datastore but wanted to get off of that due to the issues with REFS.  While using that proxy it was working so i am wondering if it is something with the Linux proxy more than anything.

Userlevel 2

So just for grins i built a test job with the same settings and added in a server that failed from the main job and it is working fine…  Maybe something about too many processes at a time?

Userlevel 7
Badge +20

Could be resource contention but it would depend on the proxy settings for CPU/RAM.  One CPU per task for processing as per BP.

Userlevel 2

Linux has 12cpu’s and the proxy settings are set at 10 max concurrent tasks.

Userlevel 7
Badge +20

Seems good with those specs. Hmmm

Userlevel 2

For grins I put the concurrent tasks down to 5 and re-ran the main backup job and had the same failed servers.

Userlevel 7
Badge +20

Might suggest support case at this point.

Userlevel 2

What do you make of it taking snaps via vcenter vs using the storage snaps?  Is there somewhere i can poke at that to see why?

Userlevel 7
Badge +20

If you have it configured in Veeam with the plugin it should work as expected which is the puzzling part.

Userlevel 2

I think I just stumbled on something.  When the Linux proxy was setup no NIC’s were added for iscsi  Doing a rescan of the Altera SAN I get a error saying that it can not find a suitable proxy for retrieving data.

Userlevel 7
Badge +20

I think I just stumbled on something.  When the Linux proxy was setup no NIC’s were added for iscsi  Doing a rescan of the Altera SAN I get a error saying that it can not find a suitable proxy for retrieving data.

That might be the fix.

Userlevel 7
Badge +17

What do you make of it taking snaps via vcenter vs using the storage snaps?  Is there somewhere i can poke at that to see why?

Hi @hamel01 - Even when using Storage Integration, vCenter still takes a snapshot to be able to ‘snap’ the storage Volume the VMs are on. The snap process is just significantly briefer than normal.

I think I just stumbled on something.  When the Linux proxy was setup no NIC’s were added for iscsi  Doing a rescan of the Altera SAN I get a error saying that it can not find a suitable proxy for retrieving data.

Did you configure your Linux Proxy for Storage Integration or DirectSAN? I share how to do so using Ubuntu 22.04 in a post I did a few months back. You can check it out below:

In your Backup job, you can test if your job would work with ‘basic’ settings by going into the Storage section > Advanced button > Integration tab and selecting ‘Failover to standard backup’ (it’s not selected by default). This will cause your job to run via Network (nbd) method. My guess is it would work.

Basically, what I’m seeing you need to do is grant access on the SAN Volumes to the Proxy. I can’t tell if you’re using iSCSI or as you shared initially have connected a Volume(s) to your Proxy as RDM? Personally, I would stay away from RDMs. If possible, I’d create a Volume on your Array, then connect it to your Repository server via iSCSI connections. If your Repo server is a VM, you need to add a NIC (or 2) connected to the Port Group (vmnics) you use for storage. 

It appears your Proxy VM may be a ‘combo box’ of both Proxy and Repo? Do you have a physical server to create your Proxy on if possible? Even if not, if you still want to use a VM, you just need to add a vnic or 2 then configure them for iSCSI (shown in my post), then connect those vnics to your storage Port Group. Then, grant ACL access on the datastore Volumes your VMs are on that you want to back up. If using Storage Integration..you only need ‘snapshot only’ access; if DirectSAN, you need both ‘volume and snaphot’ access.

For Linux Repos, I also did a post for that, which you can review as well

 

Skim through them to see where you may have skipped a step and let us know.

Hopefully those help. 

Comment