Solved

Failed to create MachineMutex

Forum|Forum|3 years ago
March 31, 2023
11 comments
1354 views

spider32
Influencer

I'm seeing this error "Failed to process VM: Failed to create MachineMutex" when doing storage snapshot. The first try fails but the 2nd try succeeds and usually the issue comes and goes.

Failed to create snapshots on primary storage: Failed to wait mutex sandiscover7c39d713-bdbb-40f1-ac8d-af87774de299_b72e5bb7-87e2-4802-9043-2d6a76f9052d: timeout 600 sec exceeded
Failed to create storage snapshot for datastore abcdef: Failed to wait mutex sandiscover7c39d713-bdbb-40f1-ac8d-af87774de299_b72e5bb7-87e2-4802-9043-2d6a76f9052d: timeout 600 sec exceeded

Who has seen this issue and cause/resolve info?

Best answer by MicoolPaul

I’ve just sat down and gone through this again and here’s what we can discern:

Terminology: Mutex is a “mutually exclusive” section of code, only one processing thread is allowed to access this at a time. So with this in mind, we can focus on the other words around this. As this is a Storage Snapshot and then a SAN Discover process times out. I believe what is happening is:

Veeam is requesting that a snapshot is created for storage snapshot.

Veeam then performs a SAN discovery process to find the storage snapshot to mount it. This process is allowed to take up to 10 minutes, and then times out.

Is the NetApp presenting many other shares such as SMB/NFS shares, or anything else that could be putting the SAN under heavy stress?

My gut says this is one of the following:

SAN under stress
Software bug in NetApp or Veeam
Network connectivity issues between Veeam & NetApp management interface.

Hope this helps.

+23

MicoolPaul
Forum|Forum|3 years ago
March 31, 2023

Hi,

As the issue is a timeout, is the storage under stress?

Can you provide the following details:

Version of Veeam
Storage Vendor, Model & Firmware version
vSphere environment version (vCenter & ESXi)

Are there many virtual machines on the datastore? Is it always the same datastore? Let’s start with this.

Michael Paul - Opinions are my own and do not necessarily reflect the opinion of Veeam | https://micoolpaul.com | Mastodon: @micoolpaul@masto.nu | Bluesky: @micoolpaul.com

spider32
Author
Influencer
Forum|Forum|3 years ago
March 31, 2023

Appreciate the reply!

Version of Veeam - v11a
Storage Vendor, Model & Firmware version - NetApp / AFF - v9.10.1P10
vSphere environment version (vCenter & ESXi) 7.0.3

Are there many virtual machines on the datastore? There are 3 VMs on 2 volumes. Two of the VMs are huge (45TB). And yes, this issue is always on the same datastores.

+18

JMeixner
On the path to Greatness
Forum|Forum|3 years ago
March 31, 2023

Is it the datastore with the huge VMs?

spider32
Author
Influencer
Forum|Forum|3 years ago
March 31, 2023

Both have 1 VM that is huge. It happens a lot on one of the datastores but I also see it happening to both at the same time. One datastore has 2 VMs, ( 1 VM with 45 TB) and the other datastore has 1 VM (45TB). These 2 huge VMs are part of SQL Always On.

Ralf
Comes here often
Forum|Forum|3 years ago
April 1, 2023

I've had this a lot with IBM SVC, we received a private fix IIRC, but I think it was only really fixed after we updated the storage hardware.

+23

MicoolPaul
Forum|Forum|3 years ago
April 1, 2023

How is the storage presented to the hosts and Veeam? ISCSI/FC/NFS? Are they in separate igroups?

To take an educated guess, as it’s a timeout intermittently, I’d look at the NetApp logs when this happens and compare them to when it works. Can you share here?

Michael Paul - Opinions are my own and do not necessarily reflect the opinion of Veeam | https://micoolpaul.com | Mastodon: @micoolpaul@masto.nu | Bluesky: @micoolpaul.com

spider32
Author
Influencer
Forum|Forum|3 years ago
April 4, 2023

I’ll work with our storage admin to get the necessary logs and will post. Hopefully we can find relevant info to share. The volumes are nfs.

+23

MicoolPaul
Answer
Forum|Forum|3 years ago
April 4, 2023

I’ve just sat down and gone through this again and here’s what we can discern:

Veeam is requesting that a snapshot is created for storage snapshot.

Veeam then performs a SAN discovery process to find the storage snapshot to mount it. This process is allowed to take up to 10 minutes, and then times out.

Is the NetApp presenting many other shares such as SMB/NFS shares, or anything else that could be putting the SAN under heavy stress?

My gut says this is one of the following:

SAN under stress
Software bug in NetApp or Veeam
Network connectivity issues between Veeam & NetApp management interface.

Hope this helps.

Michael Paul - Opinions are my own and do not necessarily reflect the opinion of Veeam | https://micoolpaul.com | Mastodon: @micoolpaul@masto.nu | Bluesky: @micoolpaul.com

spider32
Author
Influencer
Forum|Forum|3 years ago
April 13, 2023

Still trying to get to the issue right now. Quite difficult due to the issue being intermittent. I’ll open a ticket with support. Will share if we come up with the cause/solution. I would tend to agree that the SAN maybe under temporary stress due to the size of the VMs.

David Barber
New Here
Forum|Forum|3 years ago
April 20, 2023

I can see the same issues and interested if there is a patch or fix for this. Thank you

spider32
Author
Influencer
Forum|Forum|3 years ago
May 26, 2023

Didn’t get a chance to open a ticket but we have recently been getting a lot of errors which I think is related to this one. The MUTEX error, I don’t see that anymore but but got replaced with a lot of the following error below. I have opened a ticket with Veeam and will share resolution.

Error: Failed to prepare VM for processing: [MachineSemaphore] Failed to wait for semaphore Global\PREPARING_SAN_VM_757ef7ad-998a-4b3a-9b21-243b68fa5ea0: timeout 10800000 ms exceeded 12:24:45 AM :: Error: Failed to prepare VM for processing: [MachineSemaphore] Failed to wait for semaphore Global\PREPARING_SAN_VM_757ef7ad-998a-4b3a-9b21-243b68fa5ea0: timeout 10800000 ms exceeded

Sign up

Login to the community