Cue Superman standing on a hill with hands on his hips, wind blowing through his hair, and cape flapping in the wind 🦸🏻 Ok, not really...haha
The truth is, “sysadmin superhero-like” tactics would not have been needed if I had been a bit more meticulous in my storage configuration tasks.
Veeam Flexibility
One of the many benefits of using Veeam Backup & Replication is its flexibiltiy in how you deploy it. Personally, I use the “Advanced” deployment method, creating separate servers for Veeam and my ancillary components, Proxy and Repository. But, to save on hardware and software (OS) costs, Veeam allows you to combine the Proxy and Repository roles. I use physical hardware for these components. As such, I wanted to push my physical compute (CPU/Memory) and network (10Gb nics) resources to the limit on my hardware, so I combined them on 1 server. At the time, I also chose to use the fastest backup method → DirectSAN. One drawback of using the DirectSAN transport mode with Windows (my former OS of choice at the time), is there’s always a possibility of somehow Windows having enough rights to wipe your production/source storage since you need to present that storage to the Proxy for DirectSAN to work properly. As long as you keep your Proxy and Repository roles separate, the probability of wiping your storage is minimal. When you combine the roles, the probability increases significantly. Configuring your Windows “SAN Policy” for Offline Shared for the source Volume can also help mitigate this, as well, which I neglected to do (and wasn’t aware of at the time as it wasn’t documented anywhere by Veeam ♂️).
Catastophre!
On one occasion I was in need of adding a Repository to my VBR environment for a new job I wanted to create. I did this task and similar tasks (e.g. increased Repository storage) many times. I created a Volume on my SAN and presented it to my Proxy/Repo combo box. All good…..so far When I partition/format my presented storage devices in Windows, I tend to use diskpart. I just like it better. Before doing so, I normally look in the Windows Disk Management tool first because sometimes the disk numbers Windows assigns are not in order. As it so happens, 1. For whatever reason, I didn’t verify my disk number in Disk Mgmt, and 2. As you probably guessed, my new disk was not assigned the latest subsequent disk number. So when I went through my diskpart commands to select disk
of the device I wanted to configure, then create my partition, create partition primary align=64
, oopsies! Yep, the disk I used was a very critical source production VM Volume I use for a vSphere datastore. All my critical VMs on this Volume (datastore) were wiped by one swift diskpart and format command! Ouch!
Veeam to the Rescue!
Thankfully, I had many ways within Veeam to recover this VM data. Not only do I take regular backups, but I integrate my storage arrays in Veeam and perform regular storage snapshots (i.e. Veeam Storage Snapshot Orchestration). To recover, I could’ve chosen to restore entire VMs one by one; perform instant recoveries for each; or...what I ended up doing...recover the data of my whole Volume (all VMs at once) back to vCenter from a storage snapshot Veeam helped me to create. So that’s what I did...recovered my data from my storage snapshot. One thing about doing this recovery method, it can take quite a while; not because of the recovery process itself, but by the process required to get vSphere ready to present your snapshot back as a datastore. To do so, you need to remove the original Volume datastore from each Host it is presented to. This can take quite a while (upwards of 30-60mins per Host?) to unmount/remove it. Not entirely sure why, but it does. Just keep that in mind. IR may have been a quicker altnerative, even though I would’ve needed to do so individually. Once I got my datastore Volume from my snapshot presented back to vCenter, I re-registered all my VMs and was up and going. Thankfully also, I backed up my data for these critical VMs often and, between the time I wiped the data and the last restore point, was only around 10mins. So I only had roughly 10mins of data loss. Whew!
Takeaways
What are some things I learned from this experience?…
- First thing... for me, I moved away from Windows and implemented Linux components. Though there’s still the chance to wipe your Volumes in Linux, it’s not as “seamless” to do.
- If you’re one who just likes using Windows, I still think it’s a fine choice to do, but I recommend separating out your Veeam roles to significantly reduce the probability of destroying source production VM data.
- If you do use Windows (or even Linux for that matter), and you instead just want to use BfSS and not DirectSAN, you don’t have to actually fully present your source production VM storage to your Proxies. Many don’t know this. All you need to do is assign your production Volumes “Snapshot only” SAN access on your storage array.
- Again, if you do still choose Windows, don’t forget to configure your Windows “SAN Policy” appropriately.
- Always keep the 3-2-1-1-0 rule in mind. Specifically, having data “everywhere” you can. Don’t just have many copies on different media types, but also have the ability to recover data in different ways. Some recovery options are better than others for a given situation (bulk vs single); and, some recovery options are quicker than others (IR vs Entire VM).
- Always be mindful of the tasks you’re working on. Don’t skip steps regardless how many times you’ve done them in the past.
- Lastly, though I would rather have used a bit less disruptive tool, like Surebackup maybe?, I was at least able to see how successful my recovery plan was
There you go. Hopefully my experience can help others not go through what I did.
Happy W rld Backup Day!