Sysadmin story: Test your backups!


Userlevel 7
Badge +14

My Sysadmin story is rather old, from my own beginnings in IT. It’s not directly Veeam related but the message is valuable for every backup administrator.

The outage

There’s been a certain Linux VM which was running a very critical software for that company. Software updates were done by the vendor itself so there wasn’t much to take care of. Of course, besides including it in the backup process.

At some point there was an outage in the datacenter which shutdown all virtual machines. After the power had been restored, all virtual machines were started again without any issues; at least almost without any issues.

The Linux VM didn’t come up and stayed at the boot screen without any progress. After a bit of debugging without a solution, we decided to restore the VM as it was in the morning and no changes occurred since the nightly backup.

Unfortunately, the restored VM had just the same problem and refused from booting up. And also the 2 day old restore point behaved the same.

At that point we had to face the truth, the system had issues long before the outage and probably would have died on the next reboot. So we had to go the traditional way, combined our Linux skills, maybe had help from the vendor, and somehow the VM was repaired and the services were back running. Don’t ask me what caused the issue or how it was resolved, as I really can’t remember 😅

The message behind the story

Now comes the important part, the message behind this story: „Test your backups!“

Doing successful backups is great, but only half of the work. You also need to make sure, that those backups are recoverable.

Just doing some brainstorming about potential issues, which could prevent you from restoring successfully:

  • the OS or application has issues (like in my example)
  • OS updates crash the VM
    • I’ve seen this with Windows Update; all other restore points were in the same state and started installating the updates on the first boot
  • wrong backup selection (missing VM, disk, files, …)
  • software issues (remember VMware CBT bugs?)
  • hardware issues (dependent backup files are broken)

And these are just some examples.

You want to have reliable backups which can be restored in critical situations. This is also what the ‘0’ in the 3-2-1-1-0 rule stands for. If you don’t know this rule, take a look at this post from @Nico Losschaert

I am pretty sure, if we would have tested the backups of that VM, we would have noticed the issue with the OS before the outage and could have started an analysis in a more quiet time.

So please don’t forget to regularly do backup tests.

As a Veeam customer, take a look at Veeam SureBackup. With SureBackup you can automatically test your backups in a separated environment and let Veeam run custom checks. This way you can not only be sure that a VM comes up with a network connection but also that your software is working as expected.

Check out this VUG UK session from @MicoolPaul if you want to see Surebackup in action:


5 comments

Userlevel 7
Badge +21

Great story. Always good to test backups for sure and Veeam makes it so easy. 👍

Userlevel 7
Badge +9

Great Story line! Take away „Test your backups!“!  This will ensure that your backups are recoverable ✅

Userlevel 7
Badge +19

Good reminder Max. Sadly, I admit I don’t test near as much as I should.

Userlevel 7
Badge +14

Good reminder Max. Sadly, I admit I don’t test near as much as I should.

But at least you do test your backups 😉

Userlevel 7
Badge +12

Nice story @regnor and thx for mentioning my post 😍

Comment