World Back Up Day. My story: Chaos on the Network!

Userlevel 7
Badge +6

Organizations generally tend to think about backing up their data and applications, which is always great!

However, they often forget to protect the configurations of the components of their IT environments. And this is very important as most corporations still have to deal with technology silos made up of the most diverse solutions, standards, and manufacturers.

The ability to quickly restore the operation of these environments in the event of failures of any type is extremely critical to ensuring business continuity!

Historically, one of the biggest challenges has always been keeping the backup of network equipment up to date. After all, companies can have hundreds or even thousands of these components in their IT environments.

Some time ago, when the concept of Software Defined Network did not yet exist, this was a manual, error-prone and problematic task.

As a testimony, I went through a very critical situation due to the lack of an updated configuration backup in a large, mission-critical environment!

During a large customer's network migration procedures to more modern technology, an edge switch needed to be replaced. However, without realizing it, the technician responsible for the operation inserted this switch with a priority configuration that made it the PVST (Per VLAN spanning tree) root of the entire network, overriding all the spanning tree tables of its core switches.

Of course, this edge switch did not have a layer 2 connectivity table, so the entire network stopped working, paralyzing this large company's operations.

As a pre-sales engineer, I was coincidentally at the client for a meeting on this day. Sometime after the incident, as the problem was not resolved, one of the managers asked for my help to check what was happening.

Unfortunately, technicians could not immediately discover the problem. After all, everything was correct with the core switches except the spanning-tree configuration. Due to pressure and desperation, technicians removed the controllers and other interfaces from both switch cores to solve the problem.

This only worsened the problem, as one of the controllers and other network interfaces was damaged. Now, it was necessary to look for spare parts for these modules.

Furthermore, the configuration backup of the core switches was not up to date. The only “good” copy was the one in the memory of the still-working controller. If it stopped working, the chaos would be even greater.

To summarize the incident, the customer's environment was out of operation for approximately 6 hours. All migration procedures were reversed, and the cause was discovered after many hours of tension.

Today, with the advent of SDNs (Software-Defined Networks), backing up and restoring network configurations has become more straightforward and more automated.

However, there is always room for improvement!

I have attached an architecture suggestion for the backup and restoration of a Cisco DNA Center environment. The DNA solution allows periodic and automatic backup of all configurations to an external backup server based on Ubuntu or CentOS and with NFS enabled.

Reference information:

Cisco DNA Center Administrator Guide, Release 2.3.5 - Backup and Restore [Cisco Catalyst Center] - Cisco

Why not improve this architecture with the Veeam Data Platform? After all, the DNA Center's backup server is still a single point of failure.

We can install a Linux backup agent on this server, or if it is virtualized, simply recognize it as a VM and create a backup job for it in Veeam Backup & Replication.

Using VBR's capabilities, we will have another copy of this backup in an on-site repository and an off-site copy, with immutability configured (Ransomware-proof) and zero-error recovery capacity—basically, Veeam's 3-2-1-1-0 architecture.

I hope you enjoyed the testimonial and suggestion! And I wish everyone a great World Backup Day! 😀👏🏻


Userlevel 7
Badge +21

That is a great story Luiz with lots of good information.  Thanks for sharing.

Userlevel 7
Badge +6

It staggers me that this is still a widespread issue. Thanks for sharing this Luiz.