Solved

Failed to create MachineMutex

1 year ago
31 March 2023
11 comments
768 views

Userlevel 6

spider32
Comes here often
59 comments

I'm seeing this error "Failed to process VM: Failed to create MachineMutex" when doing storage snapshot. The first try fails but the 2nd try succeeds and usually the issue comes and goes.

Failed to create snapshots on primary storage: Failed to wait mutex sandiscover7c39d713-bdbb-40f1-ac8d-af87774de299_b72e5bb7-87e2-4802-9043-2d6a76f9052d: timeout 600 sec exceeded
Failed to create storage snapshot for datastore abcdef: Failed to wait mutex sandiscover7c39d713-bdbb-40f1-ac8d-af87774de299_b72e5bb7-87e2-4802-9043-2d6a76f9052d: timeout 600 sec exceeded

Who has seen this issue and cause/resolve info?

icon

Best answer by MicoolPaul 4 April 2023, 19:45

View original

11 comments

Userlevel 7

+22

MicoolPaul
On the path to Greatness
2293 comments
1 year ago
31 March 2023

Hi,

As the issue is a timeout, is the storage under stress?

Can you provide the following details:

Version of Veeam
Storage Vendor, Model & Firmware version
vSphere environment version (vCenter & ESXi)

Are there many virtual machines on the datastore? Is it always the same datastore? Let’s start with this.

Userlevel 6

spider32
Author
Comes here often
59 comments
1 year ago
31 March 2023

Appreciate the reply!

Version of Veeam - v11a
Storage Vendor, Model & Firmware version - NetApp / AFF - v9.10.1P10
vSphere environment version (vCenter & ESXi) 7.0.3

Are there many virtual machines on the datastore? There are 3 VMs on 2 volumes. Two of the VMs are huge (45TB). And yes, this issue is always on the same datastores.

Userlevel 7

+17

JMeixner
Veeam Vanguard
2603 comments
1 year ago
31 March 2023

Is it the datastore with the huge VMs?

Userlevel 6

spider32
Author
Comes here often
59 comments
1 year ago
31 March 2023

Both have 1 VM that is huge. It happens a lot on one of the datastores but I also see it happening to both at the same time. One datastore has 2 VMs, ( 1 VM with 45 TB) and the other datastore has 1 VM (45TB). These 2 huge VMs are part of SQL Always On.

Userlevel 6

Ralf
Comes here often
75 comments
1 year ago
1 April 2023

I've had this a lot with IBM SVC, we received a private fix IIRC, but I think it was only really fixed after we updated the storage hardware.

Userlevel 7

+22

MicoolPaul
On the path to Greatness
2293 comments
1 year ago
1 April 2023

How is the storage presented to the hosts and Veeam? ISCSI/FC/NFS? Are they in separate igroups?

To take an educated guess, as it’s a timeout intermittently, I’d look at the NetApp logs when this happens and compare them to when it works. Can you share here?

Userlevel 6

spider32
Author
Comes here often
59 comments
1 year ago
4 April 2023

I’ll work with our storage admin to get the necessary logs and will post. Hopefully we can find relevant info to share. The volumes are nfs.

Userlevel 7

+22

MicoolPaul
On the path to Greatness
2293 comments
1 year ago
4 April 2023
Answer

I’ve just sat down and gone through this again and here’s what we can discern:

Terminology: Mutex is a “mutually exclusive” section of code, only one processing thread is allowed to access this at a time. So with this in mind, we can focus on the other words around this. As this is a Storage Snapshot and then a SAN Discover process times out. I believe what is happening is:

Veeam is requesting that a snapshot is created for storage snapshot.

Veeam then performs a SAN discovery process to find the storage snapshot to mount it. This process is allowed to take up to 10 minutes, and then times out.

Is the NetApp presenting many other shares such as SMB/NFS shares, or anything else that could be putting the SAN under heavy stress?

My gut says this is one of the following:

SAN under stress
Software bug in NetApp or Veeam
Network connectivity issues between Veeam & NetApp management interface.

Hope this helps.

Userlevel 6

spider32
Author
Comes here often
59 comments
1 year ago
13 April 2023

Still trying to get to the issue right now. Quite difficult due to the issue being intermittent. I’ll open a ticket with support. Will share if we come up with the cause/solution. I would tend to agree that the SAN maybe under temporary stress due to the size of the VMs.

I can see the same issues and interested if there is a patch or fix for this. Thank you

Userlevel 6

spider32
Author
Comes here often
59 comments
1 year ago
26 May 2023

Didn’t get a chance to open a ticket but we have recently been getting a lot of errors which I think is related to this one. The MUTEX error, I don’t see that anymore but but got replaced with a lot of the following error below. I have opened a ticket with Veeam and will share resolution.

Error: Failed to prepare VM for processing: [MachineSemaphore] Failed to wait for semaphore Global\PREPARING_SAN_VM_757ef7ad-998a-4b3a-9b21-243b68fa5ea0: timeout 10800000 ms exceeded 12:24:45 AM :: Error: Failed to prepare VM for processing: [MachineSemaphore] Failed to wait for semaphore Global\PREPARING_SAN_VM_757ef7ad-998a-4b3a-9b21-243b68fa5ea0: timeout 10800000 ms exceeded

Comment

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded