Have you ever seen a snapshot/checkpoint fail to merge? Maybe VMware vSphere was telling you that a virtual machine required consolidation, or Microsoft Hyper-V reported that the checkpoint couldn’t be merged. Whatever the reason, it’s a painful situation.
Or should I say, it’s a painful situation, ONCE you notice! And IF you don’t, you have a disaster approaching.
Why Snapshots/Checkpoints are bad in the long-term
If you need bringing up to speed as to why we don’t like snapshots/checkpoints sticking around, here’s the non-exhaustive, good to know key points:
- Snapshots/Checkpoints make the original virtual disk a read-only file, and all changes from the snapshot/checkpoint creation point forwards are saved to a ‘delta’ virtual disk. This will consume more storage space.
- Snapshots/Checkpoints cause increased IO penalties, reducing performance.
- When deleting snapshots/checkpoints, we need to merge the ‘delta’ disk(s) into the source disk(s), creating increased IOPS demands, and increasing the risk of corruption and data loss due to the modifying of disks, especially if a crash or power loss event were to occur.
The above can all become more chaotic and amplified if we have multiple snapshots/checkpoints or a split in the tree of these, this requires yet more processing and IO and inherits a higher probability of encountering a problem.
Now, onto how we can solve this! With a story!
The Inspiration for this Blog Post
I had a customer that was creating snapshots for backups, until I highlighted that snapshots weren’t backups (and neither are checkpoints by the way!). The customer was creating a snapshot every time they patched their platform, which was every 3-6 months. This had been taking place for about 6 years.
Snapshot removal attempts were tried, and all attempts failed. The customer was now very panicked. Turns out they didn’t have any support agreements available either to seek VMware support.
The Solution
At this point I suggested deploying Veeam Agent to protect the VMs from within their Operating Systems, thus circumventing the snapshot creation process at the hypervisor level.
Once we had a ‘full backup’ of each server, I agreed a scheduled window to stop the services running on these servers, perform a final backup, and shutdown. Once the VM was shutdown, I performed an Instant Recovery of the Agents into the VMware platform, and migrated to production! At this point we had no data loss, and no pesky snapshots, so we could now target the new VMs as usual for a Veeam Backup & Replication backup job.
I will however point out that as these are new VMs from a vSphere/Hyper-V perspective, you’ll need to reapply any VMware tags etc to ensure they continue to abide by any tag-based elements such as Veeam Backup & Replication protection schedules.
Bonus Tip #1: Capturing Stale Snapshots/Checkpoints
There are many great tools out there to assist in the identification and remediation of stale snapshots/checkpoints. My favourite is unsurprising, Veeam ONE, due to the wealth of reports & built-in monitoring provided, to better understand the underlying hypervisor platform. It’s also hypervisor agnostic and works great with both Microsoft Hyper-V & VMware vSphere-based solutions.
However, VMware & Microsoft do have some ‘out of the box’ options available to highlight this.
Within the VMware world, you could leverage vCenter and configure alerts based on snapshot size, or a snapshot simply existing. Otherwise more complex validation could be handled via PowerCLI scripts, or utilising VMware’s monitoring platform, VMware Aria Operations (formerly VMware vRealize Operations).
Over at the Microsoft world you could choose to leverage System Center Operations Manager integrations with System Center Virtual Machine Manager, or leverage PowerShell-based alerting & reporting.
Bonus Tip #2: Alternative Conversion Tools
I suggested Veeam Backup & Replication with Veeam Agent functionalities to break the stale-mate in my scenario, but it’s not the only tool available. If you aren’t licensed for Veeam Backup & Replication, or maybe you are still on socket-based licensing without sufficient Agent instances to carry this out, there are other tools available.
VMware vCenter Converter has been a long-time, and loved tool for many sysadmins, and VMware shocked the community when it announced the removal of the vCenter Converter. When reading up on the announcement, VMware made clear their continued work on the tool, and that it needed a rework to adapt to the modern requirements. Earlier this month, VMware announced the availability of this tool, though it is still in beta at the time of writing, so I’d suggest against this for production workloads. Hopefully soon we’ll see a fully supported, generally available release of this tool once again.
Microsoft also provide tools for converting disks, though generally speaking these don’t appear to get much fanfare, and feel less-integrated into the overall experience. Microsoft offer ‘Disk2VHD’ as part of the Sysinternals tooling, this will convert a disk to a VHD, though you are still required to create the rest of the VM characteristics yourself and then attach the VHD. Microsoft claims that the tool can be used by Windows XP and higher versions, but no references to anything newer than Windows 7 exist within the documentation, highlighting the age of this tool. Unfortunately, Microsoft’s better integrated solution ‘Microsoft Virtual Machine Converter’ didn’t see support continue past Server 2012 R2 and has been completely retired. It’s hard to recommend any native tooling by Microsoft at this point, over the powerful and simple steps required by Veeam within their Backup & Replication solutions.