I'm Sure You've Head, But If Not...Watch Your SD Cards! with vSphere 7U2!!
For those who don’t go to the VMware forums (another ‘communities’ based vendor website) often, there was finally a VMware employee who somewhat unofficially acknowledged the issue of running ESXi on SD cards, specifically with vSphere 7U2. I say U2 because v7 and v7U1c, of which I both have run on DELL IDSDMs, runs flawlessly; and has for the past year.
To read a little more on the issue, you can review the VMware Communities post here. The issue, it seems, has to do with their newly formatted boot partitions. But honestly, I think the mishap (Hosts disconnecting/hanging) is mostly due to a new vmkusb driver. VMW is currently working on a fix, though they’re really recommending orgs to start using ‘high-performance’ storage (i.e. disks) for the boot device moving forward. All that to say is, if you haven’t upgrade to 7U2, I recommend not to. If you’re running your boot on disk, I think you’ll be fine. Otherwise, hold off.
One of the Veeam Vanguards, Luciano Patrao, did a nice little post on the issue & his experience. You can read his article here.
Cheers!
Page 1 / 1
I have also read from different sources that SD and/or the Controller is causing problems. Besides that current problems, the SD cards also had some other problems in the past and certain types also were not so reliable.
esxcfg-rescan -d vmhba32
If this works then it's a real gem...I'm sure everyone had the situation were the host losts its SD card and if you could solve this with that command ...woah 🥳
I think I heard that too @regnor ; but I’m not completely convinced. i think most of the issue is the driver, and maybe even the new boot partition format. That esxcli cmd doesn’t really work. It’s only a band-aid….fixing the Host I think anywhere from 2-7 days.
Yes that's true, but at least it revives the host for some time. The final solution will be to replace the SD cards with regular disks as those are way more reliable.
Very interesting article for sure.
Thank you for pointing out this problem! It seems to be, the higher ESXi version is, the harder it is for a SD-card to survive. By chance we have discussed this topic internally. From my perspective, actually new hosts should be equipped with boot HDDs/SSDs. This will avoid these problems completely!
DELL has a newer SD module which appears should be fine, I was told today. And the ESXi Requirements, pg. 16 of PDF, still shows SD as “supported”. I guess VMW needs to either remove this supportability, or at least list supported modules. Replacement in-line is not an inexpensive process, of both HD cost and FTE time. For smaller orgs who don’t use 1. tanzu (i.e. k8s), 2. vSAN, 3. NSX a minimal install device like a SD should be fine. SD manufacturers need to toughen them up I guess
HPE sells some kind of Raid1-SD-card-plugin. The thing here is, there is no way - at least I do not know any - to monitor this black-box. Most of the time it works fine.
Good to know, if I ever go the HPE route again. Thanks!
HPE sells some kind of Raid1-SD-card-plugin. The thing here is, there is no way - at least I do not know any - to monitor this black-box. Most of the time it works fine.
I was also thinking the same. You have double the lifetime but some day it wills till break your host 🧨
Thx @coolsport00 for sharing. Didn’t know and interesting to know, alsmost all customers are running ESXi on SAS disks or SSD disks in RAID-1, but there are exceptions running on SD-cards...
Interesting. Most folks I know run it on ESXi. Though, I think enterprise orgs probably run it on HDDs/SSDs.
Good Article @coolsport00
Thank you sir. Really appreciate it.
Interesting that this issue is still ongoing. I just wanted to share some resources from the community regarding this.
Firstly apparently VMware are planning a patch this month. ( @coolsport00 originally shared this article in his post, but I don’t recall if the announcement of a patch this month was included at the time, re-sharing for awareness)
Secondly, Dell are pulling support for ESXi 7.x on SD/USB storage, they recommend their Boot Optimized Storage Solution moving forwards. Link here
Finally, vExpert PRO Andrew Hancock has been doing a lot of investigation on his twitter. Andrew managed to destroy a high endurance SD card within 30 minutes, by simply installing ESXi and then changing a few settings and rebooting, no VMs on the storage, no vCenter or anything complex, just plain vanilla ESXi.
The Dell compatibility change is what’s most shocking, this is going to start to increase the cost of ESXi solutions as we have to look at not just the migration to SSD/NVMe, but the potential requirement of a RAID controller if anyone wants RAID1 on their boot and also the higher minimum capacities that these storage solutions are sold with.
The patch V7U2a has this problem, too.
It has hit us in one environment and destroyed several SD-Cards and with this several ESXi servers in one cluster were not accessible anymore. The vSAN had some problems afterwards.
Thank god for backup
The patch V7U2a has this problem, too.
It has hit us in one environment and destroyed several SD-Cards and with this several ESXi servers in one cluster were not accessible anymore. The vSAN had some problems afterwards.
Thank god for backup
Please? 🥺
Yes, ESXi config backup would be great…
The patch V7U2a has this problem, too.
It has hit us in one environment and destroyed several SD-Cards and with this several ESXi servers in one cluster were not accessible anymore. The vSAN had some problems afterwards.
Thank god for backup
Please? 🥺
(probably known) To workaround, use PowerCLI and backup ESXi Configuration:
Interesting that this issue is still ongoing. I just wanted to share some resources from the community regarding this.
Firstly apparently VMware are planning a patch this month. ( @coolsport00 originally shared this article in his post, but I don’t recall if the announcement of a patch this month was included at the time, re-sharing for awareness)
Secondly, Dell are pulling support for ESXi 7.x on SD/USB storage, they recommend their Boot Optimized Storage Solution moving forwards. Link here
Finally, vExpert PRO Andrew Hancock has been doing a lot of investigation on his twitter. Andrew managed to destroy a high endurance SD card within 30 minutes, by simply installing ESXi and then changing a few settings and rebooting, no VMs on the storage, no vCenter or anything complex, just plain vanilla ESXi.
The Dell compatibility change is what’s most shocking, this is going to start to increase the cost of ESXi solutions as we have to look at not just the migration to SSD/NVMe, but the potential requirement of a RAID controller if anyone wants RAID1 on their boot and also the higher minimum capacities that these storage solutions are sold with.
I know, this is not a very popular option, but you can safely run ESXi on a single SSDs as well. No Raid1 → no Raid controller. Pricing is between SD-Raid and your mentioned Boot optimized device.
@vNote42 love both of the points you raised here. As an MSP these are my problems with them as a universal solution:
PowerCLI means manual scripting, which is great when you’ve got singular environments but when we manage a huge amount of them it gets complex, keeping the module up to date, ensuring success of the config backups etc. Not impossible of course, but it increases the maintenance required vs it being integrated into a product. Aka, complicated at distributed scale.
Single SSD on the other hand works great at scale, but not great for smaller environments, some customers simply don’t have (or won’t allow) for spare hosts in the budget, instead seeking for as much consolidation and component redundancy as possible. But others with larger clusters this definitely becomes a possibility for. But at least the BOSS card isn’t as expensive as a new host! It definitely becomes more likely you’ll need point one however and have a PowerCLI backup of your config!
@vNote42 love both of the points you raised here. As an MSP these are my problems with them as a universal solution:
PowerCLI means manual scripting, which is great when you’ve got singular environments but when we manage a huge amount of them it gets complex, keeping the module up to date, ensuring success of the config backups etc. Not impossible of course, but it increases the maintenance required vs it being integrated into a product. Aka, complicated at distributed scale.
Single SSD on the other hand works great at scale, but not great for smaller environments, some customers simply don’t have (or won’t allow) for spare hosts in the budget, instead seeking for as much consolidation and component redundancy as possible. But others with larger clusters this definitely becomes a possibility for. But at least the BOSS card isn’t as expensive as a new host! It definitely becomes more likely you’ll need point one however and have a PowerCLI backup of your config!
Agree with you! Single SSD for booting is good for a larger amount of hosts. It will be cheaper because of the number of hosts you spare some money. And when you operate such a environment a PowerCLI backup-script will also be no problem there. On the other hand are the smaller customers/environments. There you do not want to manage scripts or such. Hosts should just run. When a disk fails, it should be able to replace it without further tasks. These customers have to understand why they have to pay for this new boot device.
As an MSP we are avoiding already a long time ago the use of SD-cards for installing ESX. We are mostly using 2x SSD in RAID-1 (before 2x HDD) for the OS of ESX. Often we are also using software defined storage (like Datacore) with local disks. Then we install the VM with Datacore on the local RAID-1, so then we need it already.
It looks like the problems with USB/SD could be solved by the recent patch: