Have you ever seen a snapshot/checkpoint fail to merge? Maybe VMware vSphere was telling you that a virtual machine required consolidation, or Microsoft Hyper-V reported that the checkpoint couldn’t be merged. Whatever the reason, it’s a painful situation.
Or should I say, it’s a painful situation, ONCE you notice! And IF you don’t, you have a disaster approaching.
Why Snapshots/Checkpoints are bad in the long-term
If you need bringing up to speed as to why we don’t like snapshots/checkpoints sticking around, here’s the non-exhaustive, good to know key points:
- Snapshots/Checkpoints make the original virtual disk a read-only file, and all changes from the snapshot/checkpoint creation point forwards are saved to a ‘delta’ virtual disk. This will consume more storage space.
- Snapshots/Checkpoints cause increased IO penalties, reducing performance.
- When deleting snapshots/checkpoints, we need to merge the ‘delta’ disk(s) into the source disk(s), creating increased IOPS demands, and increasing the risk of corruption and data loss due to the modifying of disks, especially if a crash or power loss event were to occur.
The above can all become more chaotic and amplified if we have multiple snapshots/checkpoints or a split in the tree of these, this requires yet more processing and IO and inherits a higher probability of encountering a problem.
Now, onto how we can solve this! With a story!
The Inspiration for this Blog Post
I had a customer that was creating snapshots for backups, until I highlighted that snapshots weren’t backups (and neither are checkpoints by the way!). The customer was creating a snapshot every time they patched their platform, which was every 3-6 months. This had been taking place for about 6 years.
Snapshot removal attempts were tried, and all attempts failed. The customer was now very panicked. Turns out they didn’t have any support agreements available either to seek VMware support.
At this point I suggested deploying Veeam Agent to protect the VMs from within their Operating Systems, thus circumventing the snapshot creation process at the hypervisor level.
Once we had a ‘full backup’ of each server, I agreed a scheduled window to stop the services running on these servers, perform a final backup, and shutdown. Once the VM was shutdown, I performed an Instant Recovery of the Agents into the VMware platform, and migrated to production! At this point we had no data loss, and no pesky snapshots, so we could now target the new VMs as usual for a Veeam Backup & Replication backup job.
I will however point out that as these are new VMs from a vSphere/Hyper-V perspective, you’ll need to reapply any VMware tags etc to ensure they continue to abide by any tag-based elements such as Veeam Backup & Replication protection schedules.
Bonus Tip #1: Capturing Stale Snapshots/Checkpoints
There are many great tools out there to assist in the identification and remediation of stale snapshots/checkpoints. My favourite is unsurprising, Veeam ONE, due to the wealth of reports & built-in monitoring provided, to better understand the underlying hypervisor platform. It’s also hypervisor agnostic and works great with both Microsoft Hyper-V & VMware vSphere-based solutions.
However, VMware & Microsoft do have some ‘out of the box’ options available to highlight this.
Within the VMware world, you could leverage vCenter and configure alerts based on snapshot size, or a snapshot simply existing. Otherwise more complex validation could be handled via PowerCLI scripts, or utilising VMware’s monitoring platform, VMware Aria Operations (formerly VMware vRealize Operations).
Over at the Microsoft world you could choose to leverage System Center Operations Manager integrations with System Center Virtual Machine Manager, or leverage PowerShell-based alerting & reporting.
Bonus Tip #2: Alternative Conversion Tools
I suggested Veeam Backup & Replication with Veeam Agent functionalities to break the stale-mate in my scenario, but it’s not the only tool available. If you aren’t licensed for Veeam Backup & Replication, or maybe you are still on socket-based licensing without sufficient Agent instances to carry this out, there are other tools available.
VMware vCenter Converter has been a long-time, and loved tool for many sysadmins, and VMware shocked the community when it announced the removal of the vCenter Converter. When reading up on the announcement, VMware made clear their continued work on the tool, and that it needed a rework to adapt to the modern requirements. Earlier this month, VMware announced the availability of this tool, though it is still in beta at the time of writing, so I’d suggest against this for production workloads. Hopefully soon we’ll see a fully supported, generally available release of this tool once again.
Microsoft also provide tools for converting disks, though generally speaking these don’t appear to get much fanfare, and feel less-integrated into the overall experience. Microsoft offer ‘Disk2VHD’ as part of the Sysinternals tooling, this will convert a disk to a VHD, though you are still required to create the rest of the VM characteristics yourself and then attach the VHD. Microsoft claims that the tool can be used by Windows XP and higher versions, but no references to anything newer than Windows 7 exist within the documentation, highlighting the age of this tool. Unfortunately, Microsoft’s better integrated solution ‘Microsoft Virtual Machine Converter’ didn’t see support continue past Server 2012 R2 and has been completely retired. It’s hard to recommend any native tooling by Microsoft at this point, over the powerful and simple steps required by Veeam within their Backup & Replication solutions.
Hey Michael, great article!
I’d like to share a humble solution that saved my day some times where the snapshot mess was so huge that was impossible to merge them.
In this scenario there was a maintenance window, so I could perform this offline. I just created extra disks for the VM, using the same size and then used Clonezilla to clone the old disks to the fresh disks. All the attention in the world is necessary to be sure you’re cloning the right way (or say bye-bye to the data).
After cloning is done, I remove all the disks and re-attach the new ones on the same position where the original disks were. The VM itself remains the same, with the tags, uuid, etc.
I have used it both on vSphere and Hyper-V environments.
Thank you Michael for showing other ways to achieve the goal.
Thank you 😊
Thank you Michael for showing other ways to achieve the goal.
Taken from https://www.vmware.com/pdf/convsa_43_guide.pdf
I've had the same just recently and also first went with the Veeam Agent to create a backup. Snapshots should by default have aTTL where they automatically get deleted or at least cause a warning. Even vCenter only has an optional check for snapshot sizes..
If snapshot consolidation doesn't work, cloning could be the way to go as long as the snapshots/disks not already are causing problems on the Hypervisor. Also you need to have enough free disk space on your storage.
I’ve been following what
@dloseke suggests: cloning is the (sole?) way to go...
@dloseke for the alternative suggestion there too via the VM cloning 😁 provided vSphere doesn’t freak out it’s handy as the one issue I have with using Agent is that the VM must be powered on, so you’ll effectively have some form of RPO between final shutdown and the restored VM.
Unfortunately yes, there are security vulnerabilities in the vCenter Converter tool, it’s not the only tool with this either, as the Disk2VHD tool just does a dumb search for named DLLs and lets Windows return the DLLs, so you could store an identically named, malicious DLL elsewhere in your system and get Disk2VHD to call this, as it requires running with high permissions, it can get very destructive, quickly!
One way that I’ve worked around VM’s with too many/too large of snapshots is to Clone the VM. Offline preferred obviously. Once cloned, bring up the new clone and then delete the original. Of course, this will cause a new VM and new backup chain once you add the VM back into the backup server, so it does have its on caveats, but in really bad situations, it may have to be done.
I do still have the old vCenter Standalone Converter (it can be found online from non-VMware sources), and it’s very robust, but as noted, it was discontinued and a replacement is in beta. I wasn’t aware of it, but apparently there were some strong security vulnerabilities in the converter which should be avoided. Note however that the new converter only goes so far back as vCenter 6.5U3, so if you want to convert older machines like 6.0 and 5.x, you’ll need to find other means such as the older converter or using Veeam/Veeam Agent.
Really great article Michael as snapshot consolidation seems to always be an issue with backups. Nice to see ways to address this. 😎