Skip to main content

Recently I had to configure a backup on a physical Linux server that stores over half a petabyte of traffic fine images collected from speed cameras on federal highways in Brazil.

 

server disks

 

 

The catch: this server runs critical client applications, and any migration to another server is extremely sensitive and requires careful planning before taking action. At the moment, a full migration wasn’t feasible.

Due to the urgency of the demand, it was necessary to implement a backup solution immediately, but the server cannot go down for even one minute. Any downtime would mean fines not being recorded, resulting in immediate financial loss.

 

envio_imagens

 

The Challenge of Backing Up a Mission-Critical Environment

 

The client required backup implementation for compliance reasons. I use Veeam Agent for Linux, configuring the backup job at the volume level. To avoid impacting main traffic, I limited the throughput to 50 Mbps using Veeam’s throttling feature.

 

network traffic rules

 

network day rule

 

network night rule

 

 

Everything seemed well planned, until I ran the first backup job.

 

The Problem: Snapshot task freezing SMB service

 

During execution, I noticed that the image transfers stopped every time Veeam tried to create a snapshot. Data would pile up in the queue, and the whole process took much longer than expected.

After monitoring the server for hours, I discovered the culprit: the SMB service froze exactly at the moment the snapshot was initiated. Without SMB, images couldn’t be transmitted, and Veeam couldn’t move forward either.

 

The Investigation and Key Discovery

 

To test my hypothesis, I forced a systemctl stop smb. As soon as the service stopped, Veeam completed the snapshot with no issues. I restarted SMB right afterward, and image transfers resumed normally.

In other words, snapshots and SMB simply could not operate simultaneously in this environment.

 

The Workaround

 

Since this server has no maintenance window and absolutely cannot go offline, fixing the root cause  wasn’t an option at the moment.

 

The workaround I implemented was to create an automated monitoring script:

  • The image processing system generates continuous logs.
  • I set up a script that runs every minute and checks if there’s a delay greater than one minute in transfers.
  • If a delay is detected, the script automatically restarts the SMB service.

 

With this, Veeam Agent for Linux can successfully complete the snapshot, and image transfers continue with minimal impact, just a few seconds.

 

veeam job tasks running successfully
image

 

 

Final Thoughts!

 

Of course, this isn’t the perfect solution. The ideal scenario would be to fix the root cause or even redesign the server architecture. But in critical environments, we often need to choose what’s possible right now rather than what would be theoretically ideal.

 

How would you handle a fix in a scenario like this? I’d love to see other perspectives and possible solutions for resolving this issue.

 

 

Nice to see you found a workaround to help with your issue.  Definitely something to me that should be addressed properly to fix the root of the issue. 


Nice to see you found a workaround to help with your issue.  Definitely something to me that should be addressed properly to fix the root of the issue. 

 

Absolutely, fixing the root cause is ideal. In this case, due to the urgency and critical nature of the server, the workaround allowed us to implement the backup without downtime. It’s definitely a temporary solution until a proper fix can be applied.


@matheusgiovanini Really nice findings Matheus. congrats!

 

Matheus, I don’t know if you change some default settings like compression an storage optimization.

For image and audio files, is good to disable the compression level, but for transfering on WAN link is good to have a higher compression level. I wld suggest you to do some tests using compression disabled and higher levels, the amount of compute resource is very related to I/O profile. 
DO some adjustes on the storage optimization as well.
 

 


Thank you for the suggestions, Andre!

I already set the compression level to none. That environment is really a headache 😓

 

 


Comment