A SysAdmin (happy ending) story


Userlevel 7
Badge +7

Hi all,

for the upcoming  "System Administrator Appreciation Day," I wanted to tell you about an amazing mis-adventure that happened to me a few years ago.

But first, a brief recap of my work life: I've been working in the IT world since 2012, starting in the Helpdesk of a local IT services company, and then moving to the Datacenter Team a few years later.

Over the years I've encountered a variety of problems, from the simplest and least blocking, to the most complex and critical: servers that no longer boot, applications that don't work, stuck snapshots, deleted LUN..but the one I'm about to tell you about represents for me the most unusual problem I've ever faced (and unfortunately not the most critical).

Our company was a Service Provider, imprinted mainly on IAAS and other related services.

We managed a total of four datacenters: one at HQ, where there were a few racks with essential internal services, and three others located at an Italian supplier where we delivered the bulk of the services for customers..about 20 racks in total.

A few years ago the unthinkable happened: a big storm with an attached whirlwind hit one of these datacenters, causing a real disaster for our services and generating several sleepless nights for our team.

By now, when we tell the story among ourselves we joke about it, but truly we went through that as well: rain inside the datacenter! 😅

The building was obviously built to high standards, but it was not enough to prevent the worst..I still remember the morning I went in and saw the hallways still wet, a shock!

When I entered the room where our cabinets were I saw many sysadmins from other companies who were already working..in "panic mode"! 😵

A super super senior colleague of mine and I began to do a damage count: a few servers that wouldn't start, burned out power supplies, and one last generation storage that was practically to waste. The problem had not so much been the rain, which fortunately had not come directly into our server room, but the water that caused the electrical short on the power systems, somehow triggering the sprinkler system as well, resulting in the release of that dust that spread to most of the hardware.

Only a couple of servers, a much older but resilient storage and...the nas with backups had been saved!

As soon as we recovered some computational resources and got the firewall and switches up and running again, I put on my green Veeam T-shirt (not figuratively speaking) and started restoring everything I could! 💚

Fortunately, that datacenter mainly provided DR services (as fate would have it) for us and our customers, but also a few other small production services.

The recovery work was long and exhausting, and I'll save you further details..but what's important is that in the end we were able to be up and running again within three/four days!

To be a SysAdmin you need a spirit of sacrifice, perseverance, a lot of patience, and most of all team work..find colleagues you trust, a team you are comfortable with and work well with, it will be essential for the hard times that will come..which will certainly come!

Long live the SysAdmins! 🌈

Marco


9 comments

Userlevel 7
Badge +20

Wow now that is quite the story and recovery.  👍

Userlevel 7
Badge +17

Well done on the recovery efforts Marco!

Userlevel 7
Badge +7

Thank you guys! 🙏🏻

Userlevel 7
Badge +4

Great story, Marco!
I only imagine the amount of time you guys spent restoring all the backups...

Userlevel 7
Badge +8

In my previous job I had a customer where the sprinklers went off and lets just say they didn’t have a “Datacenter” in mind when the building was built. Every PC in the building had a few inches of water in them and many were sitting on the floor.  We were able to get most of the data recovered, but water and electronics is a bad combo. Corrosion and rust on contacts is inevitable so it’s usually a large insurance claim. 

 

Happy to hear you made it out the hero in this story!

Userlevel 2

Nice story, man. Glad you got out of it!

Userlevel 7
Badge +14

Rain inside the datacenter reminds of a situation where the AC was blowing ice/snow flakes on the servers. Nothing happend but still a very odd situation 😅

Glad that the story ended good for you!

Userlevel 7
Badge +17

A literal ‘natural disaster’! I think sometimes we ‘gloss’ over those words when talking about data recovery, but they are indeed real, as you shared here. Great job on the recovery!

Userlevel 7
Badge +6

In my previous job I had a customer where the sprinklers went off and lets just say they didn’t have a “Datacenter” in mind when the building was built. Every PC in the building had a few inches of water in them and many were sitting on the floor.  We were able to get most of the data recovered, but water and electronics is a bad combo. Corrosion and rust on contacts is inevitable so it’s usually a large insurance claim. 

 

Happy to hear you made it out the hero in this story!

Once water has entered the system, even if it’s working, I’d highly recommend replacing it because corrosion will continue.  It’s like buying a car that has been flooded - you may not see all the damage but there’s going to be damage that that will bite you as time goes on.

Reminds me of an issue one of our divisions had a few years ago.  We have a low-voltage cabling division in my company that runs network cabling and the like, and they were near completion on rehabbing a building that had thousands of feel of CAT6A cabling.  At some point someone triggered something causing the fire sprinker system to go off.  While those cables were probably fine, Panduit (I believe) wouldn’t warrant that cabling because water could be induced into the lines.  So out had to come all of the cabling that had been completed, and it all had to be recabled - and to add to it, the project was only a few weeks from completion.  We ended up pulling techs from other jobs and from other regions and they were able to get the cables replaced in time.  I shudder at the thought of how much cabling was never used and had to be sent off to the recyclers.

Comment