Horror Story!


Userlevel 7
Badge +7

Hi There!

I've just shared this story in my VUG Group, Spain,

But I wanted to translate it to English and share it with all of you,
Its a Personal experience,

 

Hi,

I want to share with you my latest experience as a IT Infrastructure responsible, just a week before moving into a new company.

August 10th 2022, enjoying my holidays, I was celebrating my birthday with my family, my mobile dings and I saw the message:
“We’ve got a Virus in our Systems”

I left everything I was doing, quickly went to the office to get a better understanding of what was going on, and execute the necessary actions to mitigate and solve the issue.

Into my head I had the crazy idea:

“Its just a joke” my colleagues are doing this to scare me and its a surprise… I was wrong.

We confirmed that the virus was a ransomware called <DonkeyF*cker> and hit our servers and spread himself like gunpowder into our Servers.

After a huge effort, we finally find the origin of the Infection and encryption, we took it offline, formatted with no regret, and then, Time to recover… “Surprise!!” Our backups were hit and the repository was not accessible!

 

Calm down, we could recover all our servers from the attack, cause in our design, we were replicating all virtual machines between hosts, like a “last resort” plan if we lost production backups and replicas.

We restored everything, thanks to our vision, Plan DR execution and also Veeam configurations and knowledge.

Im sharing this as a horror story, but with a positive ending, but when you are in the middle of the “situation”, it’s a Nightmare.


Please share your experiences and horror stories, so all of us can learn a bit from everybody else’s experiences.

Take care.

Luis.


8 comments

Userlevel 7
Badge +20

Wow it is amazing how you react when these things happen but as you planned this well with Veeam and DR strategy it was great to see you recover.

Userlevel 7
Badge +13

It’s interesting to know where backups were and how them were hitten.
Storage with an open vuln? Misconfiguration?
And one more: did that group exfiltrate data while encrypting?

But after all question, great work @HunterLAFR in recovering everything 🙂👏🏻

Userlevel 7
Badge +7

​ Like @marcofabbri I would be interested to know how your backups were affected , if you have remediated the vuln?

 

Forutnately you was available during your vacation.

Userlevel 7
Badge +7

Thanks for sharing! I am glad the outcome was good!

 

I had a scary situation once over a Christmas holiday when trying to add a new IO board to a storage array.  The procedure I used went bad and the entire production array started shutting down and required a reboot! Nothing like hearing 100’s of disk drives home and turn off! It really makes you feel like you are going to faint! 😉
 

Fortunately I had weeks worth of good Veeam backups to restore from and was able to get the array back in working order in a matter of hours! 

Userlevel 7
Badge +7

It’s interesting to know where backups were and how them were hitten.
Storage with an open vuln? Misconfiguration?
And one more: did that group exfiltrate data while encrypting?

But after all question, great work @HunterLAFR in recovering everything 🙂👏🏻

Thanks for your words,

the backups and replicas were stored in a Synology DS1821+

The OS partition went down at the same time of the infection…. Very weird right?

it took us like 5 retries to get it back online, clean of configurations.

 

I do believe in coincidences, but in IT when something like this happens, it’s not coincidence at all.

(in my personal opinion).

 Cheers.

Userlevel 7
Badge +7

e.​ Like @marcofabbri I would be interested to know how your backups were affected , if you have remediated the vuln?

 

Forutnately you was available during your vacation.

The NAS was inaccesible, and after restoring configurations, everything looked good, but some files were modified before going down, so we treated like compromised storage for security reasons.

Also bring it back online took us a few hours, and we spent the time first mitigating the ransom ware and getting prod. Back online.

now thinking after the incident, it was cool to be able to recover from it, but still feeling the “fear” of loosing everything!

take care.

Userlevel 7
Badge +20

Thanks for sharing, I wish more people would be open to talking about their bad days, their lessons learned etc. It helps us all process what happened, gain insights into alternative approaches and, ultimately, become better at our jobs.

 

My story isn’t a backup story, but it has an important lesson.

 

What you KNEW isn’t what you KNOW.

 

I was the head of an IT department in a previous job, and I designed & implemented the solutions the company required. We opened a new office and had site to site connectivity via VPLS, this enabled our IP Phones to communicate to the PBX at the other location.

 

All was great for a month or so, post-site opening. Then, one morning, I got the dreaded call that the phones weren’t working. I entered the office and could confirm the phones had power, but no IP addresses, and therefore couldn’t communicate to the PBX.

 

I checked the VLANs and noticed that the VLANs weren’t right, the switch stack looked like it had lost its configuration. So I set the VLANs back in place, committed the config, then rebooted the switches to force the PoE to stop & start, resetting the phones and granting them their IP addresses.

 

This worked and the phones were responsive again, I went to carry on with my day, and then after 5 minutes, the phones died again. As the phones had IP addresses still but had lost communication, I started looking at site to site connectivity, confirming all was functional there, and performing all sorts of diagnostics.

 

The business starts to get irritated at the outage happening during a peak time. That was incredibly stressful, all eyes on you and YOUR design, and questioning why it’s not working when everything is correct.

 

After 40 minutes, I stop and reset myself, and make a list to go from the beginning, this is when I found the root cause of the original outage… The switch had soft rebooted into its backup config, dropping the VLANs.

 

I wasted 40 minutes believing I knew what it COULDN’T be.

 

So, remember, tech isn’t static, and assumptions can be costly. At this point in my career, I started becoming more granular at when things have been checked, relative to outage/impact.

 

The switches got a new firmware and fixed the fault afterwards also 🙂

Userlevel 7
Badge +7

@HunterLAFRThanks for you sharing. Fortunately, you choose Veeam for Backup & Disaster Recovery!

Comment