vSphere 7U1 Standard vSwitch/Port Group Connectivity Issue


Userlevel 7
Badge +16

Hello Community!
This will be a non-Veeam post today, but I thought it would be worthwhile to share about a vSphere network issue I encountered recently, and the steps I took to resolve it, in case any of you experience a similar occurrence.

 

PROBLEM

I had 1 ESXi Host in a Cluster which, when VMs in a certain subnet were connected to a segregated vSS and PG, there would be no network connectivity. No pings. No network connection of any kind. I use vSS and not vDS, so one might think there could be a configuration issue, rightfully so. But, I use PowerCLI to configure my Hosts, and all of them were configured using this same PCLI script, so the chances of that being the case was virtually zero. And, the other Hosts in this Cluster, when VMs were connected to the same subnet (vSS & PG), connectivity was fine. The following are some troubleshooting steps I took to try and narrow the cause down:

  • Viewed first to see if the vmnic Teaming Policy in the vSS was set to ‘active’. Believe it or not, I have had it happen where my vmnic wasn’t in the active state even after being configured with my script (which does set vmnics active when assigning them to vSS PGs). As you can see below, it was all good there:

     
  •  I then SSH’d to the Host to check the link state and that was all good > all are ‘up’ and ‘up’:

     
  • Then, things started looking a little “buggy”. Data was being received, but nothing transmitted, when looking at the nic stats:

     
  • Even though the physical switch port is configured with a specific VLAN, but not trunked, thus meaning no VLAN ID is required in the vSS Port Group, I went ahead and configured a VLAN ID anyway; but still no connectivity.
  • I disconnected the network cable from the Host and connected it to my laptop, configured an IP on my laptop in the same subnet and was able to connect to the subnet fine, so the issue potentially being cable-related or physical switch port-related was debunked. The problem had to be either the nic port itself, which I thought was very unlikely since 1. this Host is fairly new and w. was the 2nd nic port in a dual-port embedded nic card; or the issue was something going on in software (i.e. vSphere)
  • I looked at one last area just to verify if the vmnic in question was actually being seen in vSphere as “used” in the vSS Teaming Policy. As noted in my first troubleshooting bullet item above, you can see the nic is configured for the active state, but when going to the vSS area and just “viewing the vSS settings”, the vmnic showed as ‘unused’. BINGO!


RESOLUTION

Ok, so there is a diconnect here on what is seen by the software and what is actually configured. I attempted to remove the vmnic from the vSS, then re-add it as active, but that did no good. Still showing as ‘unused’. What I ended up having to do to fully resolve this was remove the vmnic, then blow away (delete) the PG and vSS. I then re-created the vSS & PG, added the vmnic back, then voila’!...all good. I now have network connectivity for this subnet on this Host.

Hopefully this helps anyone else who may encounter this issue.

Cheers!


12 comments

Userlevel 7
Badge +20

Thanks for sharing! Good to see the QA in vSphere is improving 😉 adding that to my troubleshooting checklist!

Userlevel 7
Badge +16

Oh gosh @MicoolPaul  ...don't even get me started about VMW QA over the last 5-7 years. Horrible! Just horrible! 😕

Userlevel 7
Badge +4

@coolsport00 Good Insight !

Userlevel 7
Badge +20

Not Veeam related but definitely useful in case others encounter.  Nice share Shane.

Userlevel 7
Badge +16

Thanks all. Glad to provide. 

Userlevel 6
Badge +1

Thanks for sharing

Userlevel 7
Badge +13

Thanks for sharing!

A few month ago, Intel had some serious firmware bugs in their NICs. I had seen some behaviors that I wouldn't believe existed if I hadn't seen them with my own eyes.

Userlevel 7
Badge +16

The DELLs I have use Broadcom. And, not knowing what Intel’s issues were, if I didn’t upgrade both the firmware and the vSphere VIBs for BC, the nics wouldn’t be stable. Never experienced that before. I shouldn’t confess this, but when I’d get new servers for vSpere, except for maybe the BIOS f/w, I’d rarely update anything else. I just assumed the h/w vendor shoulda taken care of that :joy:
Shh...that’s just between us! haha

Userlevel 7
Badge +20

The DELLs I have use Broadcom. And, not knowing what Intel’s issues were, if I didn’t upgrade both the firmware and the vSphere VIBs for BC, the nics wouldn’t wouldn’t be stable. Never experienced that before. I shouldn’t confess this, but when I’d get new servers for vSpere, except for maybe the BIOS f/w, I’d rarely update anything else. I just assumed the h/w vendor shoulda taken care of that :joy:
Shh...that’s just between us! haha

Yeah HPE servers seem to be the same with updating NIC firmware and vibs as well.

Userlevel 7
Badge +13

The DELLs I have use Broadcom. And, not knowing what Intel’s issues were, if I didn’t upgrade both the firmware and the vSphere VIBs for BC, the nics wouldn’t be stable. Never experienced that before. I shouldn’t confess this, but when I’d get new servers for vSpere, except for maybe the BIOS f/w, I’d rarely update anything else. I just assumed the h/w vendor shoulda taken care of that :joy:
Shh...that’s just between us! haha

Yes, you shouldn’t confess :grin: Most of the time, hosts are not delivered with latest FW, just like storage arrays.

Userlevel 7
Badge +11

Again, thx for sharing @coolsport00 

Userlevel 7
Badge +16

Yes sir...glad to. 

Comment