Hi,
I’ll attempt to answer all of your questions within a general context:
Re-IP won’t work as it’s orchestrated from VBR at DR time. So you’d need a surviving VBR server to invoke that, otherwise, correct you’d be manually re-IPing them (or migrating your subnets to DR site).
Id suggest if this is purely an active/DR site pairing and it’s not active/active to migrate your VBR to the DR site. You could also have a standby instance of VBR that you copy your config backups for VBR to. That way you could rapidly load this into the standby VBR instance to recover. Repository rescans may be required as there may be backups/replicas from after your config backup took place. Hence the recommendation is to keep VBR outside of the site(s) you’re trying to protect, so it doesn’t exist in the same fault domain and is unlikely to have the same issue impacting the VBR instance as your production environment.
hope this helps!
Hi Ravi, Michael,
I’m facing exactly tyhe same scenario as Ravi’s one, and I’d like to have more clarifications about the action needed to manage the Failover to the DR site, and then the final Failback from DR to HQ site.
First I have to state that the migration of VBR server on the DR site is not an option, as the DR site could be shutdown / unrechable for administrative reasons: during those periods the replication/bckcopy jobs would be interrupted, but backup/bckcopy jobs on HQ do have to continue: if the VBR server would be moved to DR site, no Backup jobs would be possible at HQ during DR un-reachability.
So my questions below are related to the scenario when a VBRserver standby instance is used at the DR site: I’ll try to outline the verious Failover/Failback steps, and where those steps are un-clear : my questions are underlined and marked with ??» tag…
Step 0: Setup
Step 0.1: Backup the VBR Server config to the DR site as often as possible.
??» we can do that directly from VBRSrv GUI, but the best interval is ‘daily’. That means that the latest config backup wouldn’t have any track of any replica made after the lates config backup, even a day before: does that matter to allow failover at DR from the latest replica??.
Step 0.2: Configure replicas: repository for replica metadata is plased at HQ site, not the DR
??» is this a problem/limitation when performing the Failover with HQ site unreachable?
??» there was no answer to Ravi’s question: “ I understand the backup repository in case of replication only holds the meta data of replication, so should this also be protected? Or if I have High/Available instance of Veeam Backup & Replication in DR site, will this also be addressed? ”
Step 2: Disaster happens: HQ site is down: start Failover procedure at the DR site:
Step 2.1: Activate The VBRsrv standby instance
Step 2.2: import the lates config backup from HQ into VBRinstance
Step 2.3: rescan Veeam repositories available in DR
Step 2.4: rescan replicas in ordfer to get VBRsrv aware of “replicas from after your config backup took place ”
??» How could this action be done, as the replica metadata repository is at HQ site and so unreachable??
Step 2.6: disable Backup and BackupCopy jobs relevant to HQ site
Step 2.7: Activate Replicas’s Failover (for esch replica, or using Failover Plans)
Step 2.8: Check results, tests VMs, if needed select a previous Replica’s point-in-time
Step 2.9: Go on with other DR Failover Actions (Vmware, Networking, servicing…. etc.)
Step 3: Disaster ended, HQ infrastructure recovered: start Failback procedure
Step 3.1: All protected VMs at HQ are shutdown
Step 3.2: Backup current DR/VBRsrv configuration and move to HQ
Step 3.3: Shutdown DR/VBRsrv
Step 3.4: Restart Veeam Infrastructure (VBRSrv / Proxys / WabAccel / vCenterSrv) at HQ
Step 3.5: Import config back made from DR/VBRsrv into HQ/VBRsrv
Step 3.6: Rescan repository / vCenter / other Veeam related resources?
??» should we do that?
??» how should we do that i the correct way?
Step 3.7: Anything else to do at HW site or at DR site before starting VBR Failback ??
??» there’s something I forgot to add here?
Step 3.8: Perform per-VM failback (DR-VM are protected & shut down, HQ VMs are started back in production after latest syn with DR VMs)
Step 3.9: Check everything (vmware, netwroking, storage, swecurity, service)
Step 3.10: re-enable VBR jobs at HQ site
Step 3.11: END
I think that offical documentation is somehow lacking depth for this VITAL feature, and I believe that your answer will be of ggreatest help for all VEEAM customers!
Best Regards
Flavio
Best practice is to run Veeam at DR site.
Running a Veeam server at DR for replication, and a Veeam server at the Main site for backups could be an option.
Or run Veeam at DR and have proxies at both sites.
I Scott, thanks for the suggestion.
I know Veeam’s best practice about VEEAM server placement at DR, but, as I stated before:
“ he migration of VBR server on the DR site is not an option, as the DR site could be shutdown / unrechable for administrative reasons: during those periods the replication/bckcopy jobs would be interrupted, but backup/bckcopy jobs on HQ do have to continue: if the VBR server would be moved to DR site, no Backup jobs would be possible at HQ during DR un-reachability. ”
I would be more than happy having the VBRsrv at DR, but I found no answer to my statement above: how to keep BAC/BCPCPY jobs at HQ when VBRsrv in DR is unavailable?
Do you have an answer to this?
Having two VBRsrv (one at DR for replication, one at HQ for BCK/BCKCPY) yes would be a fine option, but it would require to have VEEAM Enterprise Manager in order to distribute the license, unless we want to duplicate the licensing cost.
https://forums.veeam.com/vmware-vsphere-f24/managing-licenses-with-two-veeam-server-t76265.html
Now I have to check where to put the VEEAM-EM, and if it has to be protected when its site is down.
- Running Veeam in DR is recommended. I guess your DR site isn’t REALLY a DR site. Multi site locations with true DR doing get shut down for administrative reasons. If your DR site is down, and you have a disaster, the boss will wonder why they are paying for DR.
- OK, so you plan on turning this site off and on, obviously you need Veeam to run in the main site. Yes you need a server at the other site if the main site goes down. Running the second Veeam server out there controlling your replication, while having the other server do the backups IS the best option in this case.
- Do you have to use EM? Pretty sure if you licenses the sockets you might be ok here, (would need to confirm with Veeam) Also, if you use VUL, just use some VULS for backups, and some VULS for replication. I think you don’t need EM. However, EM is a good idea for things like indexing on backups anyway.
- Seeing as how you plan on shutting down the other site from time to time install EM at the main site. To me it sounds like you shouldn’t run anything even remotely considered production out there if you plan on shutting it down.
- My boss would agree to recover to the latest availbable point-in-time, ack DR un-availability window is acceptable, while backups are somehow available at main site. Things could change in the future, and DR getting steadier. At the moment DR site is mainly for planned Failover and for improvement staging.
- I do deeply apreciate your suggestion and surely we’ll go with it. You opened my mind!
- Think we also go on with EM. we’re using VUL licensing and backup and replication will protect the same loads. More EM seems a good thing to have.
- I agree with you: EM at the main site is the best option.
Thank you for your answer: you helped a lot!
Regards
Flavio
If the boss is ok with that then it’s all good
Good luck.
Either way, server at each site, control replication from DR so if main site goes down, you can flip the switch at that side and bring things up. Keep Storage/ proxy / Veeam server for backups at main site.
EM is your choice if you use VUL as you need a separate VUL for replicas as opposed to backups. You could just have 2 totally separate servers with separate purposes if you want.
Also remember to size everything appropriately and you’ll need proxies at both sites for replication.
EM is your choice if you use VUL as you need a separate VUL for replicas as opposed to backups
@Scott
Replica and Backup jobs from the same VM only require a single VUL.
Because two backup servers are used most likely with the same VUL license, an EM server is required to manage licensed workloads. It‘s somewhere in our license policy.
The EM server will track workloads between both VBR server and recognize that the same vm was processed by replica and backup jobs. So only 1x instance will be used even if two backup servers are doing the jobs.
Thanks,
Fabian
EM is your choice if you use VUL as you need a separate VUL for replicas as opposed to backups
@Scott
Replica and Backup jobs from the same VM only require a single VUL.
Because two backup servers are used most likely with the same VUL license, an EM server is required to manage licensed workloads. It‘s somewhere in our license policy.
The EM server will track workloads between both VBR server and recognize that the same vm was processed by replica and backup jobs. So only 1x instance will be used even if two backup servers are doing the jobs.
Thanks,
Fabian
That makes sense. I didn’t realize you could do both with one VUL. (I still use sockets)
Thank you @Scott and @Mildur for your clarifications.
I’m going on with EM + VBR servers at main and DR sites + relevant proxies and WAN accelerators as needed.
We already had replica jobs working fine managed by VBR-Main.
When all the infra will be set up I’ll ‘migrate’ replicas from VBR-Main to VBR-HQ: I hope to do this without loosing the replica point already set-up. Fingerprints & WAN cache would have to be rebuilt.
I do have another clarification to ask.
In the case o f replicas Main>DR managed by VBR-DR, where should better be placed the Replicas’ Metadata repository? at Main site, near the src VMs? or at DR site, near VBR-DR and replicas?
Regards
Flavio
When all the infra will be set up I’ll ‘migrate’ replicas from VBR-Main to VBR-HQ: I hope to do this without loosing the replica point already set-up. Fingerprints & WAN cache would have to be rebuilt.
You can use Replica Mapping. Create a new Replica Job and map the source VM to the existing Replica VM on the target Host
I believe, all previous replica points (Snapshots) will be removed when you do the mapping.
https://helpcenter.veeam.com/docs/backup/vsphere/replica_seeding_vm.html?ver=110#configuring-replica-mapping
In the case o f replicas Main>DR managed by VBR-DR, where should better be placed the Replicas’ Metadata repository? at Main site, near the src VMs? or at DR site, near VBR-DR and replicas?
Metadata is used by the source backup proxy.
For best replica performance metadata must be as near as possible to the backup proxy in your production site.
Best,
Fabian
Thank you Fabian
I’ll keep this post updated with my exprerience, in order to help who’s kept in this scenario
Thank you Fabian
I’ll keep this post updated with my exprerience, in order to help who’s kept in this scenario
Look forward to your update. It sounds like you have a really good well thought out plan
UPDATE 1 - VeeamEM is up and running @MAIN
Set up notifications
Added VBR-HQ server
Rescan Servers
VUL licenses are there, along with jobs, VMs, agents
Very fine piece of SW, like vCenter for ESXi ! will explore further.
NEXT STEP: VBR-DR with its own infrastructure
>...]
HI.
I setup the new VBR-DR server with relevant infrastructure: It’s linked to VeeamEM and got the VUL license from there.
In the infrastructure I added the HQ repository where VBR-HQ stores replica metadata
Now I’m wondering what the best way to migrate existing Replica Jobs from VBR-HQ to VBR-DR:
- I have 12 Replica jobs
- I’d like to test the procedure with a Test VM and it’s replica first and then migrate other jobs one-by-one
- The migration procedure shouldn’t arm replicas still working on VBR-HQ
- Some VMs are very large, so I’d like to try to keep the new replica setup as light as possible, possibly re-using metadata / digest / fingerprints…
My first version would be:
- Disable Replica Job on VBR-HQ
- Create the new ReplicaJob on VBR-DR:
- same parameters as job on VBR-HQ (name, source, destination, schedule, Net/IP mangling...)
- same replica metadata repository (wonder if this could harm VBR-HQ operation?)
- map replica to replicated VM already existing in DR site - ???? Delete replica job on VBR-HQ ??? (wouldn’t that delete metadata, and replica points?)
--- OR ---- better keep the job on VBR-HQ there until the first run was done on VBR-DR? - launch replica job
As you can see, I’m concerned about VBR-DR using the same metadata repository used by VBR-HQ.
Furthermore I’m asking what would happen if a replica job lanched by VBR-DR would act on a VM on wich a backup job was launched by VBR-HQ
Or a more conservative procedure, that will rebuild metadata/digest/fingerprints...
- Delete replica job on VBR-HQ (that should keep the replica VM at DR)
- Create the new ReplicaJob on VBR-DR:
- same parameters as job on VBR-HQ (name, source, destination, schedule, Net/IP mangling...)
- use own new replica metadata repository located at HQ and accessede via one proxy running there - Map replica to replicated VM already existing in DR site
- launch replica job
Which one you’d choose?
thanks
Flavio
[update]
After careful thinking I decided to ‘migrate’ jobs using a conservative option, so:
- Disable Job on VBR-HQ
- Create new job on VBR-DR, with same parametrs as HQ job, except
- use own replica metadata repository located at HQ, served by own proxy
- do not seed: map to existing replicas (auto-map with detect is working fine)
- set scheduling outside VBR-HQ backup window (to reduce the risk of activity done by VBR-HQ and VBR-DR overlapping on same VM) - Start the job @ VBR-DR
- first run will reset CBT, recalculate digests, recalculate fingerprints
- first run will be longer that usual, I think due to CBT reset
- first run will remove all existing replica’s snapshots, keeping the latest one - Delete Replica Job from VBR-HQ
- Remove Replica records from VBR-HQ/Replicas (this won’t delete/touch replica VM)
- /Optional] Redo Replica to test veaam related integrity
- /Optional] Test replica with Failover/UndoFailover
The process is going a replica at a time …
Flavio
PS. Still wondering what would happen if a replica job launched by VBR-DR would act on a VM while a backup job launched by VBR-HQ is working on same VM …..
[update]
Hi
the Replica Job migration process ended positively and now we’re into Failover tests since two weeks.
Both Veeam BR servers are working fine and we didn’t find any issue releated to their concurrent operations.
I don’t know if this solution can be seen as a ‘best practice’ but it has proven to be efficient and effective until now.
I’ll post an update if any issue would arise.
Have a Good 2023!
Thanks for the update. Very cool information!