Cloud Failures, Clouds on Fire, Clouds… Underwater? A Reminder on the Importance of Data Protection in the Cloud


Userlevel 7
Badge +20

Well this wasn’t my intended first content post for Cloud City, but it seems too much of a ‘live’ event to pass up.

 

Now, as you’re on this website you already know about the importance of data protection. You know that clouds can catch fire. You also know that data can be accidentally deleted just as easily as maliciously. But even you might be surprised that a Cloud has ended up with a water intrusion event. Even clouds have their rainy days, if you’ll pardon the pun.

That’s exactly what happened at GCP last yesterday/early today in their Paris region though, a water intrusion event has caused multi-cluster shutdowns and rendered the region unavailable currently.

 

But let’s get back to talking about you. Now thankfully, your organisation has you to remind them that planning for a disaster after the disaster is no good. We need to expect the unexpected, and be confident we can continue to provide services to the business.

 

So, take this opportunity to remind them of violations to the 3-2-1-1-0 rule, remind them of applications and services that don’t have regional redundancy, and remind them that with Veeam, even if your cloud goes down, with Veeam’s focus on data freedom, you can just restore to your own infrastructure, or another cloud. Your data, your choice.

 

Oh, and remind them how clever you were to recommend Veeam to them in the first place! 👏👏👏


6 comments

Userlevel 7
Badge +20

Wow!  Thanks for sharing this very interesting.

Userlevel 7
Badge +10

So...the interesting thing for me is that a GCP “region” is a single point of failure. I know that AWS definition of “region” has at least three availability zones.

And I will echo and paraphrase @MicoolPaul in saying “Nobody expects the Spanish Inquisition!”

Userlevel 7
Badge +22

But wait AWS, Microsoft and Google, they back it up right :) 🤣😅🤑Is it not the shared irresponsibility model? 

It is funny but after all of this time and all of the info out there about O365 not being backed up I actually heard someone say this the other day.

Userlevel 7
Badge +14

The GCP example shows that you shouldn't store your backups in the same data center or region. In the worst case you could lose both your production and your backup data.

Userlevel 7
Badge +10

The GCP example shows that you shouldn't store your backups in the same data center or region. In the worst case you could lose both your production and your backup data.

The GCP example also shows that it’s important to understand what each provider means by each term. I had no idea that for GCP, a “region” can mean all your data in that “region” is stored in a single building.

Userlevel 7
Badge +20

@HangTen416 makes some good points here.

 

Firstly the architecture differences between AWS/Azure/GCP were described to me in the following sentences:

Amazon runs on AWS, Microsoft runs on Azure, GCP runs on Google.

And I don’t say that as an insult to GCP, but I think it provides valuable insight to the way design decisions would be made with their products in the future.

 

Furthermore, and this sounds ridiculous to say, you have to trust, or truly understand, your hosting/cloud provider’s architecture to get the best out of it.

Looking at OVH’s 2021 fire in Strasbourg, there were two technical design decisions that I feel are worth shining a light on:

  • Backups were stored within the same datacentre buildings, despite a contractual insinuation that they would be stored as multiple separate copies. I won’t provide too much detail here as this has already been taken to court and a ruling against OVH was made judged on the incompatibility between their delivered backup service providing local backups, and that being unable to meet the contracted terms of service offered to the customers. Link for details: https://www.datacenterdynamics.com/en/news/ovhcloud-ordered-to-pay-250k-to-two-customers-who-lost-data-in-strasbourg-data-center-fire/
  • The information on this is quite murky, but from my understanding OVH had four physical buildings in Strasbourg, SBG-1 through to 4. But OVH offered an SBG-5 as well, but this was a public cloud offering that was only a virtual datacentre, running across multiple racks within SBG 1 to 4. This meant that when people were trying to get reassurance from their data not being in the impacted SBG-1 or SBG-2 regions, they were then unexpectedly losing data they thought was unaffected. This is based on information I found on Twitter during the time of the event, but it’s harder to find details now that OVH have launched a physical SBG-5 to replace the destroyed SBG-2. The reason I talk about this though is if you expected to have multiple local copies within a region such as SBG-2 and SBG-5, you’d be frustrated to find out they were actually just rack neighbours and both copies were gone. Extra bad luck points if your third copy was in an impacted SBG-1 rack!

Comment