Database consistency checks?

  • 6 July 2023
  • 1 comment
  • 118 views

Userlevel 7
Badge +14

Inspired by the post by @MicoolPaul:

I wanted to ask the community, how many of you are running MS SQL corruption checks using DBCC? How often do you do it? Can you share some examples of issues that was discovered using DBCC?


1 comment

Userlevel 7
Badge +20

Hi!

When I was in the MSP game I would have these scheduled for a minimum of a weekly occurrence on large databases. Daily if possible.

The justification was to minimize data loss in the event of corruption, and the only way to find out would be by checking!

For weekly backups the process was that if an error occurred, you could run DBCC against each instant recovery isolated copies of each day’s backup until you find the restore point with corrupted data.

 

As for issues with a database, this story is the one that made me impose this aggressively against my customers:

 

A soon to be customer had a ransomware attack that I ended up as first response for. It was simultaneous infection of all Exchange and web-facing IIS servers as customer hadn’t patched anything in 6-9 months, OS or application, no DMZ either, so everything got encrypted quickly.

 

I leveraged the Veeam backups (which thankfully weren’t encrypted due to workgroup isolation), to perform the recovery of the environment. Soon afterwards, we patched everything prior to putting back into production and creating a remedial plan to address the security issues, and some interim measures such as restricted public ingress on the firewall rules to prevent being compromised immediately again!

 

Then an interesting error was raised by the customer that a process wasn’t working intermittently, the CRM platform generating emails.

This created many queries because so much had been patched to bring the customer in line with latest OS and application versions. Eventually it was identified that there was an email plugin within the SQL database that would generate the emails and submit to Exchange, which was where the error occurred, so next we had to look into the data being fed into the plugin, as the developers were adamant the plugin was perfect.

Sure enough, I start looking into the errors and I find that SQL Server is unable to read some data due to corruption.

I went back to the oldest available backup, and DBCC still found corruption. At this point all I could suggest was that a call could be raised with Microsoft Support to investigate and provide a best path forwards, and that call ended with Microsoft stating without a good backup, the customer would have to accept the data loss option to resolve the corruption.

 

Then it got weirder!

 

All copies of the server, despite multiple offline attempts at DBCC repairing with data loss acceptable, would always still fail their checks with data loss detected.

The eventual root fix was to export the data out from the SQL Server that could export, and import into a new SQL database file on a new SQL Server instance (we opted for a new instance to enable us to deploy a new operating system and upgrade to a newer SQL Server edition at the same time, as the SQL Server version was in extended support)

 

The customer could never quantify how much data they lost by failing to validate their backups, and neither could we...

Comment