Happy Friday everyone!
Sorry it’s been a bit quiet from me recently on this front, lets jump back into things with a new fun friday topic: Backup Designs That Sounded Better in Theory?
Firstly, what do I mean by this? I mean those designs whereby everything looks great, or a particular use-case or limitation was believed to have been overcome, just to fall over for one reason or another.
My story is of an interaction I had with a DBA of a large company, who was about to learn not all backups are created equal.
I was reviewing a company’s disaster recovery strategy from an application-level perspective, as in I was ensuring that the DR strategy would be able to return the application & all dependencies to a production state.
One of the dependencies was a database, of which I noticed I couldn’t see any backups were being generated by Veeam.
The DBA then informed me that they backed up using some SQL Server maintenance plan-based backup jobs. I asked them why they were choosing to use this over Veeam, as it would still provide all of the integrations they needed etc. The DBA’s response was that they were always messing about with their backups and preferred to manage it themselves as they couldn’t guarantee how swiftly a backup engineer would be about to assist them.
I asked where these backups were going, and was informed they were being saved to a file share. (Thankfully not on the same server!). I probed further and asked if they knew if/when the file-share was being backed up, some slight panic started to set in from the DBA as they didn’t know. But I assured them it’s okay, the file-share IS being backed up. I then asked when they were backing up the required databases to the share. I was informed it was midnight.
I told them we had a problem here, the file-share is backed up at 10pm. The reason why we had a problem wasn’t clear to the DBA, they asked me why it was a problem. And I explained, it’s midday at the moment, and if this server’s database was corrupted, you’d want to go back to a backup, correct?
The DBA nodded.
Me: So you’d want to go back to last night’s backup to minimize RPO correct?
The DBA agreed.
Me: Well, what if the file-share was corrupted to, such as if the whole company was hit with a ransomware attack?
DBA: Well, we’d just get the backup from the file-share once that’s restored.
Me: But the file-share is backed up two hours before you created a new backup, so I’d be giving you a backup for the day before. You’ve gone from losing 12 hours of data to 36 now.
DBA: *moment of realisation* 🤯
The initial response was to align the database backups to take place before the file-share backup, and then a strategy was started to leverage native backups.
So, this is my story of the day, what’s yours?