Tell me if you’ve seen this situation before. You’ve got some servers & applications that need protecting. You create your backup jobs as necessary, there’s different backup jobs for your various application-integration requirements, backup frequencies, and retention requirements, it’s starting to sprawl. Inevitably there’s some job scheduling overlap. Backup jobs are waiting for other backup jobs to complete, the running tasks section of your Veeam Backup & Replication Server is ticking upwards, no biggie right? It’ll clear down eventually! Well… Let’s explore that shall we?
Background/Test Scenario
I wanted to see just how much having multiple running jobs would impact VBR, when those jobs were all in a pending state, barring one. I then wanted to compare this impact vs scheduling backup jobs within chains.
I created a Windows VM, installed the OS, then powered it down, so I’d have a static test VM to backup. I configured my backup proxy and repository on a standalone server, with one task concurrency each. I then created a backup job, with this single proxy and repository defined against the job, and no application-aware processing or other such guest processing (as I said, the server is powered down). Since the proxy & repository are on a separate server, the VBR server is being used purely for backup job orchestration, nothing more. For completeness of detail, the VBR server has got a Postgres instance installed on it too. The VBR server was also under-spec’d, due to lab restrictions. The VBR server had 2 vCPU, and 10GB of RAM, and was running on the super-duper fast (thanks Intel + VMware!) Intel Optane PCI-E storage, with the proxy & repository server running on a completely separate Intel Optane PCI-E storage device.
Baseline Metrics: The Idle Server
I gathered some base metrics of the VBR server, idling without any jobs running, and saw the following:
Average CPU: 15% Consumption
Average CPU Threads: 1500-1600
Average CPU Processes: 160
Average RAM: 4.2GB
Average number of network ports open: 300
With a baseline established, we can now begin testing our scenario.
Test One: Simultaneous Backup Jobs
I cloned the backup job 19 times to have a total of 20 backup jobs, all with identical configuration, and I manually started them at the same time. As the jobs were setting up, the CPU immediately reached 100% utilisation, and remained there whilst the jobs were being initiated. Once established, the CPU utilisation dropped, and instead I was presented with the following averages:
Average CPU: 25% Consumption
Average CPU Threads: 2600-2700
Average CPU Processes: 240
Average RAM: 6.4GB
Average number of network ports open: 600
I then repeated the test with only 10 backup jobs enabled, to see how the resources scaled at a lower level.
Average CPU: 23% Consumption
Average CPU Threads: 2000-2100
Average CPU Processes: 210
Average RAM: 5.2GB
Average number of network ports open: 550
Test Two: Sequential Backup Jobs
I then deleted my backups within my backup repository, to ensure these would be new chains again with Active Full backups, to keep things identical. I then converted each job to be chained off of it’s previous. SequentialBackupJob_2 followed SequentialBackupJob_1 etc. I didn’t see a 100% CPU utilisation during the start up of any of the jobs, instead having a brief spike towards 80-90% utilisation, but the system remaining responsive. Once established, I saw the below figures:
Average CPU: 21% Consumption
Average CPU Threads: 1550-1650
Average CPU Processes: 170
Average RAM: 4.5GB
Average number of network ports open: 330
Just like with the first tests, I then conducted this test again with only 10 backup jobs enabled, to see how the resources scaled at a lower level.
Average CPU: 21% Consumption
Average CPU Threads: 1550-1650
Average CPU Processes: 170
Average RAM: 4.5GB
Average number of network ports open: 330
Collated Results
To make things easier to digest, I’ve collated these into separate tables:
Average CPU Utilisation | CPU Utilisation | vs Baseline – Metric | vs Baseline – % |
---|---|---|---|
Idle | 15% | N/A | N/A |
10 Simultaneous Backup Jobs | 23% | +8% | +53% |
20 Simultaneous Backup Jobs | 25% | +10% | +66% |
10 Sequential Backup Jobs | 21% | +6% | +40% |
20 Sequential Backup Jobs | 21% | +6% | +40% |
Average CPU Threads | CPU Threads | vs Baseline – Metric | vs Baseline – % |
---|---|---|---|
Idle | 1500-1600 | N/A | N/A |
10 Simultaneous Backup Jobs | 2000-2100 | +500 | +32% |
20 Simultaneous Backup Jobs | 2600-2700 | +1100 | +71% |
10 Sequential Backup Jobs | 1550-1650 | +50 | +3% |
20 Sequential Backup Jobs | 1550-1650 | +50 | +3% |
Average CPU Processes | CPU Processes | vs Baseline – Metric | vs Baseline – % |
---|---|---|---|
Idle | 160 | N/A | N/A |
10 Simultaneous Backup Jobs | 210 | +30 | +25% |
20 Simultaneous Backup Jobs | 240 | +60 | +50% |
10 Sequential Backup Jobs | 170 | +10 | +6% |
10 Sequential Backup Jobs | 170 | +10 | +6% |
Average RAM Consumption | RAM Consumed | vs Baseline – Metric | vs Baseline – % |
---|---|---|---|
Idle | 4.2GB | N/A | N/A |
10 Simultaneous Backup Jobs | 5.2GB | +1000MB | 24% |
20 Simultaneous Backup Jobs | 6.4GB | +2200MB | +52% |
10 Sequential Backup Jobs | 4.5GB | +300MB | +7% |
20 Sequential Backup Jobs | 4.5GB | +300MB | +7% |
Average Network Ports | Network Ports Consumed | vs Baseline – Metric | vs Baseline – % |
---|---|---|---|
Idle | 300 | N/A | N/A |
10 Simultaneous Backup Jobs | 500 | +200 | +66% |
20 Simultaneous Backup Jobs | 600 | +300 | +100% |
10 Sequential Backup Jobs | 330 | +30 | +10% |
20 Sequential Backup Jobs | 330 | +30 | +10% |
Conclusion
Now, this was just a couple of tests to evaluate some theories I had around backup job chaining vs simultaneous backup jobs, and it trended as I expected. There were some unexpected values, such as the network port delta between 10 and 20 simultaneous backup jobs, but the simultaneous backup job values were mostly in a scaled amount to each other.
Were I to explore this subject further, I’d add extra values for 5, 25, 30, 40, and 50 simultaneous vs sequential backup jobs. I’d also go further to capture exclusively the threads and processes related to the Veeam tasks, and the specific CPU, RAM, and network consumption of these processes, to better isolate any coincidental resource utilisation by other system processes, such as Windows Update backup checks, or Windows Defender as two typical examples.
The findings are pretty clear however, more jobs running simultaneously, even in a pending state, will demand more of your system. From a RAM perspective, this makes perfect sense, as Veeam has had to enumerate all data related to the backup processing and it sitting awaiting the ability to execute the tasks related to this data.
The above also highlights a potential downside to backup job chaining. When we process all the jobs simultaneously, we enumerate all backup data in advance, such as which VMs, which proxies, guest-processing settings etc, so when it’s our time slot, we can start running near immediately. When using sequential backup jobs however, this data isn’t collated until the backup job starts, increasing the idle time for your proxy & repository between backup jobs. This could have a negative effect on how long your backups run for in total, if you consider a scenario where you had 4 proxies, 3 idle, but one processing a final VM within your backup job, the next backup job won’t start until this single proxy processes the final VM. On the flip-side however, using backup job chaining can also cause your backup jobs to start faster, as VBR doesn’t have to enumerate data for other backup jobs that it doesn’t yet have room to process any workloads for.
In summary, the above should give you a good idea of what Veeam has to do when creating and maintaining multiple simultaneous backup jobs vs backup job chains, which will help you to create optimised backup schedules to get the most out of your environments.