Recently I had to configure a backup on a physical Linux server that stores over half a petabyte of traffic fine images collected from speed cameras on federal highways in Brazil.

The catch: this server runs critical client applications, and any migration to another server is extremely sensitive and requires careful planning before taking action. At the moment, a full migration wasn’t feasible.
Due to the urgency of the demand, it was necessary to implement a backup solution immediately, but the server cannot go down for even one minute. Any downtime would mean fines not being recorded, resulting in immediate financial loss.

The Challenge of Backing Up a Mission-Critical Environment
The client required backup implementation for compliance reasons. I use Veeam Agent for Linux, configuring the backup job at the volume level. To avoid impacting main traffic, I limited the throughput to 50 Mbps using Veeam’s throttling feature.



Everything seemed well planned, until I ran the first backup job.
The Problem: Snapshot task freezing SMB service
During execution, I noticed that the image transfers stopped every time Veeam tried to create a snapshot. Data would pile up in the queue, and the whole process took much longer than expected.
After monitoring the server for hours, I discovered the culprit: the SMB service froze exactly at the moment the snapshot was initiated. Without SMB, images couldn’t be transmitted, and Veeam couldn’t move forward either.
The Investigation and Key Discovery
To test my hypothesis, I forced a systemctl stop smb. As soon as the service stopped, Veeam completed the snapshot with no issues. I restarted SMB right afterward, and image transfers resumed normally.
In other words, snapshots and SMB simply could not operate simultaneously in this environment.
The Workaround
Since this server has no maintenance window and absolutely cannot go offline, fixing the root cause wasn’t an option at the moment.
The workaround I implemented was to create an automated monitoring script:
- The image processing system generates continuous logs.
- I set up a script that runs every minute and checks if there’s a delay greater than one minute in transfers.
- If a delay is detected, the script automatically restarts the SMB service.
With this, Veeam Agent for Linux can successfully complete the snapshot, and image transfers continue with minimal impact, just a few seconds.


Final Thoughts!
Of course, this isn’t the perfect solution. The ideal scenario would be to fix the root cause or even redesign the server architecture. But in critical environments, we often need to choose what’s possible right now rather than what would be theoretically ideal.
How would you handle a fix in a scenario like this? I’d love to see other perspectives and possible solutions for resolving this issue.