Skip to main content

Tales from the Architect's Crypt, Sassy Snapshots


Geoff Burke
Forum|alt.badge.img+22

Hey folks,

 

This week's architect horror story is about a company who were true believers in the now debunked theory that snapshots are backups.

Snappy Sam was the system administrator at Silly Snippets, a photo enhancing company. If you wanted to touch up any of the billions of useless photos that you take with your phone you could send Silly Snippets the photo and they would professionally enhance it. 
They had to store a large amount of data and what is also important to note they had a very high change rate of data due to constant writes and deletions.

 

Silly Snippets was a serious organization with managers who took themselves very seriously as well. They loved meetings where they could flex their corporate intellectual muscles and everyone's favorite was the CCP, Change Control Panel. This is where the non nerds of management could reek their revenge on the techies and they thoroughly enjoyed torturing the system administrators with long drawn out questions like "what are the potential effects of that move?". This was often taken to the extreme as the chairperson of the panel, Slow Sting would apply this mantra to every step of an IT procedure. For example when a server had to be decommissioned, Slow Sting with a cunning "I caught you look" interrupted Snappy Sam wanting to know the potential implications of removing the server from the rack, i.e. what if you dropped a screw? Snappy Sam was snappy not only because he loved snapshots but also because he would snap back during meetings. This time however he did not have time to answer as his assistant Crazy Canuck Calvin asked Slow Sting what would be the implications of a lightning bolt striking him during the CCP meeting, answer the meeting would end happily. 

 

Despite the bickering work would eventually get done.

 

Once a Month they would perform Windows updates and the protocol was to take a snapshot of every single VM manually and then once the updates were done delete the snapshot. This was a long and dull procedure and Crazy Canuck Calvin disliked it enormously. He had tried to push through the CPP meetings some sort of automation for the updates and had even proposed leveraging Veeam VBR backups instead. Snappy Sam however, lived by snapshots and loved them dearly. All attempts at change were thwarted when Snappy Sam would look worryingly at the executives on the CCP panel and say “this will take us into uncharted territory” For a  person of the Upper Management ilk the word uncharted produced enormous anxieties. To them the world is run by endless meetings and well dressed presenters in suits armed with many diagrams and charts! In short most changes on the change control panel were declined. As CEO Big Bad Billy would often say “no change just for change sake” especially when discussing salary and benefit increases.


One fine day the overworked and underpaid duo of Snappy Sam and Crazy Canuck Calvin made a mistake. During one of the updates a SQL server had misbehaved and they had wasted a few hours troubleshooting and then fixing the issue. As a result Snappy Sam forgot to delete one of the snapshots. The server ran on the snapshot for a number of weeks until its performance started to suffer. What's worse this server was vital to the company's operations and could not be taken offline without long notice. Any attempt to delete the snapshot might freeze the VM for a long time.

What do you think they did? and what would you have done, first to remove the snapshot with minimal downtime, and second what changes would you propose so that this situation never occurred again? 

0 comments

Be the first to comment!