A Sysadmin Story: The cost of not updating your systems when advised, and when lives are at stake!

Forum|Forum|2 years ago
July 21, 2023
2 comments
105 views

+8

dloseke
Veeam Vanguard

I’ve posted this before in a Friday Fun comment a few months back, but I think it bears repeating, especially because Veeam really did save the day, and maybe a few lives as well! Details may be fuzzy, but it’s all true. It’s a long story, so maybe set aside some time for some light reading, or maybe use it as a 15 minute distraction from working so that you can reset your brain. And I hope you learn something too!

I’ve been having issues getting this post, so as much as I hate to do it, I’m going to send you over to my blog to view the entire post. Seems like I my have gotten a bit wordy but I’ll give you a preview below.

Setting the Stage

Long, long ago (okay, it was 2020) in a galaxy far, far away (otherwise known as a small rural community in centralish Nebraska), there resides a community hospital who shall rename nameless. We, an eastern Nebraska-based MSP, sourced this new client and it’s looking like the beginning of a fabulous relationship.

I arrive onsite on a hot June day as part of the initial sales call and to review what their infrastructure looks like. COVID is more or less just beginning to hit the area. We sit down, talk about the services we provide in an overwhelmingly hot conference room because the building AC has failed and a large temporary, portable AC system is outside forcing lukewarm air into the building, but not the part that we’re in. After we’ve talked about our services and how we can serve them, we take a look at their hardware and discover that they don’t know what everything is running on, what everything does, or if they even own the gear in their server rack. Perhaps, a not so beautiful partnership, but too early to tell.

Read the rest of the story here.

But also, below is the lessons learned from this adventure.

Lessons Learned (and reinforced)

Anybody who listened to Aesop’s fables knows that’s a moral to the story. And here’s a few from this one.

Listen to your MSP, Partners, and Service Providers. They’re probably not trying to extract as much money as possible from you. Any good provider is going to have your best interests at heart. Recommendations come for a reason, and if that flag is being waved frantically, take a moment to figure out why and address that issue.
Hardware has lifespan. It’s probably more than what the vendor will support it, but why take that risk? Dell will warrant and support hardware for up to sever years. In my experience, you can probably get 8-10 years in the best case. There’s third-party support, but if you’re needing to buy support elsewhere, again, take a moment to find out why you’re doing that. It’s probably for the wrong reasons.
Make sure you’re monitoring your hardware and can receive alerts. I guarantee you that both controllers on the failed SAN didn’t fail at the same time. One was probably boot-looping for weeks or even months, but we had no idea until they had both failed.
Make sure you have a plan to recovery from a disaster. Flying by the seat of your pants might work, but do you really want to? Figure out the best way to work yourself out of a bad scenario and be prepared for it.
Test your backups. It’s been said that your backups are only as good as the data you can restore from them. We were lucky that all VM’s except one had been backed up the night before. The reason the single domain controller backup had failed? Turns out the repository was running out of space so Veeam refused to continue backing it up.
Monitor your backups. Make sure that if you have failures, you know about it and you address it.
Keep and offsite copy of your data. And make sure it’s encrypted. Seriously, an offsite copy, while not necessary, super helpful. It took a lot less time to begin restoring data because I had a copy of the configuration database, passwords for everything, etc. I didn’t have to completely reinvent the wheel. Just fix the hub/spokes and reattach the existing tire. For every one of my clients, their configuration database is stored somewhere offsite. It’s either at one of their secondary sites, it’s in a VCC repository that they have somewhere, or it’s in my VCC repo that I host with my service provider console if they have no other option.
Documentation. Keeping that configuration database backup and copies of backups offsite is only useful if you can access that data. If you lose your passwords and configuration, it’s going to be hard to access that data. Make sure your documentation is updated. It needs to be revised because the only thing that might be worse than no documentation is outdated, inaccurate documentation.
If you can’t use Veeam, at least use something. Generally speaking, a decent backup that isn’t great but has what you need is still a good backup. As previously mentioned, at least you have a copy of your data somewhere, right? Also make sure to remember that if you don’t need much, Veeam does have a Community Edition!
Use Veeam. Seriously! Sure, there’s other products out there. But why would you use anything other that the best in class systems to protect your critical (and non-critical) business data. Plus, honestly, that peace of mind it brings you helps you sleep better at night, and that tends to be affordable at any cost, but Veeam really isn’t that expensive for the return that you get on keeping your business running. What’s the cost of your business being down? What’s the cost of losing your business? Veeam has some free (community edition) options for backing up your own systems, but even paying for the product tends to be well worth the investment.

My apologies for the length of this story, so if you made it this far (or to the blog and back) I appreciate you. Hopefully you found it entertaining, educational, and worth the time that I took to write (and rewrite) it and that you took to read it.