SureBackup is Veeam's answer to the question every backup administrator eventually gets asked: how do you know your backups actually work? Completing a job without errors means the data was written to the repository. It doesn't mean the VM boots, the applications start, or the data is consistent. SureBackup tests all of that. It starts VMs directly from compressed, deduplicated backup files in an isolated virtual environment, runs real tests against live applications, and produces a verified result.
1. How SureBackup Works
SureBackup uses the Veeam vPower NFS Service to mount backup files as a datastore directly on an ESXi host. VMs are registered and started from those mounted backup files without copying or extracting them. All writes during verification go to redo log files on a separate datastore. When verification ends, Veeam deletes the redo logs and the backup files are untouched.
The Two Verification Modes
-
Full recoverability testing: VMs boot from backup in the virtual lab and Veeam runs heartbeat tests, ping tests, application tests, and custom scripts. This is the mode that actually verifies recoverability.
-
Backup verification and content scan only: Veeam performs a CRC check on the backup file and optionally runs a malware scan. No VM is booted. It proves the backup file is intact and clean, not that the VM can recover from it.
Don't confuse these two modes. Content scan only is useful and worth running, but it's not recoverability testing. If your SureBackup job doesn't boot VMs, you haven't proven your backups are recoverable.
2. Virtual Lab Design
Basic vs Advanced Single-Host Lab
-
Basic single-host: VBR creates isolated networks automatically using VLAN offsets. Simple to set up. Works well for environments where all VMs are on the same host and network isolation via VLAN offset is sufficient.
-
Advanced single-host: You specify the isolated networks manually. Required when VMs span multiple VLANs with complex routing, or when VLAN offsets would conflict with existing VLANs. The right choice for most production environments.
Configuring the Virtual Lab
-
In VBR, go to Backup Infrastructure, then Virtual Labs, and click Add Virtual Lab.
-
Select the ESXi host where the lab will run. Choose a host with enough resources to boot the VMs you plan to verify simultaneously.
-
Select the datastore for redo logs. This should be a fast local datastore or SSD-backed LUN.
-
Configure the proxy appliance settings. It needs an IP on the production management network so VBR can communicate with it.
-
Configure isolated networks. For each production network your verified VMs use, create a corresponding isolated port group with no physical uplinks.
-
Configure IP masquerading so VBR's test scripts can reach VMs in the isolated network.
The isolated network port group must have no physical uplinks assigned. This is the entire isolation mechanism. If you accidentally assign a physical uplink, the VMs in the virtual lab will be on the production network with their production IPs. VBR doesn't validate this. You have to verify it manually in vCenter after creating the lab.
3. Application Groups
The application group is the ordered list of dependency VMs that must start before your verified VMs. If you're verifying a web application server, the application group contains the domain controller and the SQL Server it depends on. Veeam starts them in order, waits for each to reach a stabilized state, then starts the VMs under verification.
VM Roles and Startup Sequence
| Role | What It Tests | Startup Behavior |
| Domain Controller | LDAP port 389 response | Starts in Non Authoritative mode by default |
| Global Catalog | Global Catalog port 3268 response | Same as Domain Controller role |
| SQL Server | SQL port 1433 response | Waits for SQL Server service to start and accept connections |
| Web Server | HTTP port 80 or HTTPS port 443 response | Waits for web server port to respond |
Application Initialization Timeout
Each VM in the application group has an Application initialization timeout setting, default 120 seconds. If the application doesn't respond within this window, the test fails. SQL Server on a VM with large databases can easily take 3 to 5 minutes. Increase this timeout before concluding the application doesn't start from backup. Count on boot times being noticeably longer than production, especially for SQL Server or VMs with large disks.
4. Custom Test Scripts
Built-in role tests confirm that a port is open. Custom scripts let you test that the application is actually doing something. Test scripts run on the VBR server and communicate with VMs in the virtual lab through the proxy appliance. Scripts receive environment variables %vm_ip% and %vm_fqdn% for the VM under test.
POWERSHELL: Custom SQL Server test script for SureBackup
param([string]$vmIP = $env:vm_ip)
if (-not $vmIP) { Write-Host "VM IP not provided"; exit 1 }
$conn = New-Object System.Data.SqlClient.SqlConnection
$conn.ConnectionString = "Server=$vmIP;Database=ProductionDB;User Id=veeam_test;Password=Test123;"
$conn.Open()
$cmd = $conn.CreateCommand()
$cmd.CommandText = "SELECT COUNT(*) FROM Orders WHERE OrderDate > DATEADD(day,-30,GETDATE())"
$result = $cmd.ExecuteScalar()
$conn.Close()
if ($result -gt 0) { Write-Host "SQL test PASSED: $result recent orders"; exit 0 } else { Write-Host "SQL test WARNING: 0 rows"; exit 1 }
5. SureBackup Job Design
Linked Jobs vs Specific VMs
A SureBackup job can verify VMs from a linked backup job or from a specific list you select manually. Linking to a backup job is simpler: when VMs are added to the backup job, they're automatically included in SureBackup. For most environments, link SureBackup to your most critical backup jobs and run full recoverability testing on those VMs.
Schedule and Job Overlap
Configure SureBackup to run after the linked backup job completes rather than on a fixed schedule. This guarantees SureBackup always has access to the freshest restore point and eliminates the overlap problem.
6. Common Failure Patterns
Timeout Errors on Application Initialization
The VM boots but the application doesn't respond within the initialization timeout. The fix is almost always increasing the timeout, not investigating the application. Start with 300 seconds for SQL Server and 180 seconds for web servers, then tune down if needed.
Test Scripts Completing with Exit Code 1
VBR interprets any non-zero exit code from a test script as a failure. If your script is connecting via the masquerade IP and the connection is refused, check the IP masquerading configuration in the virtual lab settings. That's the most common reason scripts work outside the lab but fail inside SureBackup.
Application Group VM Fails to Find Restore Point
The SureBackup job fails with 'unable to find valid restore point for [VM name]'. Fix the upstream backup job first, then verify the application group VM is in an active backup job with a recent successful restore point.
Key Takeaways
-
Full recoverability testing boots VMs from backup and tests live applications. Content scan only checks file integrity. You haven't verified recoverability unless VMs boot.
-
The isolated network port group must have zero physical uplinks. VBR doesn't validate this. Confirm it manually in vCenter after creating the lab.
-
Application group VMs must have valid restore points. If any application group VM has no restore point, the entire SureBackup job fails.
-
Application initialization timeout defaults to 120 seconds. VMs started from backup boot significantly slower than production. SQL Server needs at least 300 seconds as a starting point.
-
Custom test scripts receive %vm_ip% and %vm_fqdn%. Any non-zero exit code is a failure. Use scripts to run real application queries and content checks, not just port tests.
-
Chain SureBackup after its linked backup job instead of scheduling them independently. Prevents overlap and ensures SureBackup always has the freshest restore point.
Full Article -
https://www.anystackarchitect.com/veeam-v13-surebackup-verification-for-vsphere/
