When Veeam Saved the day for me


Userlevel 7
Badge +8

Rewind about 2 years. We were short staffed (still are), behind on updates and upgrades and trying our best to get rid of legacy and unsecure infrastructure. After months of work, things were getting close to the finish line. Systems were running great and we were feeling quite safe and secure.

 

It all came to a sudden halt when someone said the service desk phone was ringing off the hook and no one could log in. I tried myself, and not only could I not log in, I couldn’t ping or lookup most of the VM’s on the network.  I rebooted one of my workstations and received a 169 address. I knew right away I was in trouble.

 

I could tell DCHP was having issues on my desktop, but on my second PC I could still ping most of the servers with static IP’s. This told me DNS was also down. Great I thought at 4PM on a Friday.

 

We continued to do some troubleshooting and it ended up much worse. It turns out someone decided to put a VERY large file in the sysvol folder to replicate it without confirming the room. All 4 Domain Controllers went offline, but not only that, ended up in a VERY bad state. After doing some troubleshooting it was time to do my first DC restore.

 

Veeam being non domain joined was a life saver. I had no issues logging in. This led me to my next issue. The restores were failing each time. I think the stress of 6 people hovering over me and a few thousand not able to work had me rushing which ended up wasting about 10 more minutes until I realized everything in Veeam was using hostnames and DNS. 

 

Lucky for me, I always fear the worst and had an offline copy of the physical servers, file servers, DC’s, and important infrastructure IP addresses on a physical printout.  After modifying the hosts file I was able to restore the DC’s. Veeam support was excellent helping me get the first DC up and running and restore the ability for users to start working again. After doing a bit more research I found a security guy added a huge password file to sysvol for a password policy on the DC’s. I mean a HUGE file. He learnt a valuable lesson that day. 

 

I learnt something this day as well. Even when everything is done using best practices, things can always surprise you and be improved.  The ultimate best practice is learning from something and preparing for the next time. 

I now have a few things added to my DR/Backup planning that I’d recommend as part of a best practice solution to everyone. 

 

-Have a list of physical and other critical infrastructure IP addresses on a physical piece of paper.

That IPAM application doesn’t do well when you can’t log into the PC or forget it’s IP address.

 

-Create hosts files listing all critical infrastructure. Save multiple copies on your Veeam servers to save significant time in an outage. I store a copy on the proxies, repos, and main Veeam server. You don’t have to use it all the time, but just having it is important. Make sure to include at least the following.

-ESXI Hosts

-vCenter Servers

-Veeam Proxies

-Veeam server

-Veeam Repos

-SQL servers used by Veeam

(Included FQDN and Hostnames if you are extra paranoid)

 

Those last 2 steps will shave off 2-3 hours of wasted time if I need to do this again. Even if I have to put a static IP on my workstation to access Veeam now I can. When you don’t know the IP of your IP Management Database, or anything else, things will not end well. 

 

Thanks again to Veeam, and Veeam support. I came out of this one a hero.


9 comments

Userlevel 7
Badge +20

Nice to see Veeam saving the day. 👍🏼

Userlevel 7
Badge +17

Wow... Nice save! 👍🏼

Userlevel 7
Badge +9

I am glad you got the systems up and running again with Veeam support. 

> After doing a bit more research I found a security guy added a huge password file to sysvol for a password policy on the DC’s. I mean a HUGE file. He learnt a valuable lesson that day.

This seems to me like an unfinished story. By the way, it also seems there isn’t a great alignment between the Security Team and the Infrastructure (backup) Team 

Userlevel 7
Badge +7

Host file..great “best practice”!

Userlevel 7
Badge +8

I am glad you got the systems up and running again with Veeam support. 

> After doing a bit more research I found a security guy added a huge password file to sysvol for a password policy on the DC’s. I mean a HUGE file. He learnt a valuable lesson that day.

This seems to me like an unfinished story. By the way, it also seems there isn’t a great alignment between the Security Team and the Infrastructure (backup) Team 

 

lol. I didn’t go into detail as that didn’t involve Veeam, or Veeam saving the day which was the point of this story.  Things happen. Our Security Team and Infra actually have an excellent relationship where I work.  Never fill your sysvol folder and you won’t have to worry about this 😆. If you do plan on doing something like that, make sure to expand the drive first. 

Everything worked out fine, password policy implemented, DC’s running great. 

Userlevel 7
Badge +8

Host file..great “best practice”!

I actually just checked and it’s in there. Not sure if it was when I originally set up Veeam.  The trick is to have it on EVERY server, and find a way to keep it updated.

 

Once you have many proxies, if you are a company that does VMware host replacements often, finding a way to automate this or partially automate it is huge.

 

(generate a list of IP’s every so often and put it on the Veeam server, run a PowerShell script from the Veeam server to push it out to proxies/repos/etc. )

 

In an emergency even spending some time to manually edit the hosts files beats creating them from scratch. 

Userlevel 7
Badge +20

Host file..great “best practice”!

I actually just checked and it’s in there. Not sure if it was when I originally set up Veeam.  The trick is to have it on EVERY server, and find a way to keep it updated.

 

Once you have many proxies, if you are a company that does VMware host replacements often, finding a way to automate this or partially automate it is huge.

 

(generate a list of IP’s every so often and put it on the Veeam server, run a PowerShell script from the Veeam server to push it out to proxies/repos/etc. )

 

In an emergency even spending some time to manually edit the hosts files beats creating them from scratch. 

Easier way is using DNS even if a “Workgroup” scenario.  We have some domain joined VBR servers (yes, I know tsk tsk) and then VCC is not but we have separate domains in DNS for these, so they always resolve.  Much easier that coordinating Hosts files across many servers.

Userlevel 7
Badge +7

Yes, even in our datacenters we have recently put all the Veeam infrastructure at Workgroup, and created dedicated DNS on minimal linux servers. Of course, predicting a DNS crash and also using the hosts file is a good practice!

Userlevel 7
Badge +8

True. I guess when your environment is large a dedicated DNS server will work on the isolated network. 

 

I don’t even want it to touch my VMware hosts as I am planning for a total outage.

Comment