Solved

# Restoring all domain controllers in a Windows domain (on vmware)

Userlevel 3
• Not a newbie anymore

Hi all.  There is the possibility that this weekend I might need to restore all of our domain controllers in our domain (hopefully this does not come to pass).  It is not a very active AD domain so hopefully but I might have to go back about 3 weeks to get around a problem.

Can someone point me to documentation as to how to do this from within Veeam with all the DCs down?  This would mean that no dns resolution will be happening (do I have to create a hosts file on the Veeam server to get around this?).

Another hiccup is that for one of the DCs, I notice now that application aware processing has not been turned on so it is going to be restored as just a member server without any special processing for the AD database.

Thanks.

icon

Best answer by regnor 31 August 2022, 07:53

View original

Userlevel 6
+5

Wow. Quite a comprehensive thread this has gotten. Very interesting topic.

To me the most important part of the question to me in the first place was:

All (!) DCs have to be restored at once.

If you try that with VBR without further measures it will fail.

Not a single one of the DCs will bring up the domain. This is because VBR - even since V2 - is smart enough to always do a non-authoritative-restore. This will bring up the DC in a state which first wants to copy the full domain from another DC still alive. You can take note of the process by the additional reboot a DC always does after being recovered by VBR.

Without this full sync of the domain DB your recovered DC will be a useless brick.

Problem is: If you recovered all of them, they will all look for a surviving DC - but there is none.

You then have to put one of them into an authoritative restore manually afterwards by e.g. setting the “Burflags”. Have that process in your emergency guideline just in case...

Veeam btw CAN do an authoritative restore. You can opt to do that in a SureBackup application group. Otherwise every SureBackup test would also come up with useless DCs. Keep in mind that if you have more than one DC in a SureBackup application group, only one should be brought up authoritative while the others must be set to non-authoritative. Then those will copy the domain from the authoritative one.

SureBackup can even be used to automate the authoritative restore of a DC. You might just move the DC from the lab into the “real” world AKA network. Do it carefully. No other DC must be alive. Otherwise your domain will break be getting inconsistent.

Have a secondary DNS without AD integration for your Veeam components for that. Otherwise without DNS, VBR and SureBackup will not function as expected… We always put a secondary DNS on top of the VBR server to be used only as a backup for the VI and Veeam. It’s just a service...

Userlevel 7
+14

How did you get on @MckITGuys? If there was any particular advice that you found the most helpful, please don’t forget to mark it as the answer to help others in your position in the future

Userlevel 4

Hi

VMs, I have restore full AD a couple of times. No issues, particularly if you are using Windows 2016 and above.

The “new” features VM-Generation will control the changes in the AD when is to restore.

When this ID change?

Scenario VM-Generation – ID Change

• VMware vMotion/VMware vSphere Storage vMotion / Hyper-V Live Migration – No
• Virtual machine pause/resume – No
• Virtual machine reboot – No
• vSphere host reboot – No
• Delete VM Snapshot – No
• Import virtual machine – Yes
• Cold clone – Yes
• Hot clone – Yes
Note: Either Microsoft or VMware do not support virtual domain controllers hot cloning. Do not attempt hot cloning under any circumstances.
• New virtual machine from VMware Virtual Disk Development Kit (VMDK) copy – Yes
• Cold snapshot revert (while powered off or while running and not taking a memory snapshot) – Yes
• Hot snapshot revert (while powered on with a memory snapshot) – Yes
• Restore from virtual machine level backup – Yes
• Virtual machine replication (using both host-based and array-level replication) – Yes

Before we restore the broken Domain Controller, some Domain Controllers restore Best Practices.

• If you are restoring a broken DC in a multi DCs environment, then you can do a normal restore and follow then next restore process.
• If you are restoring all DCs in your Domain, then you should first restore the one with all RMSO roles (this information you should have before you lost your DCs).
• If you are restoring the DC that had the FMSO roles and you want to keep that way(meaning all the existing DCs will sync from your restored DC), then you need to do follow Microsoft How to recover authoritative restore .

So when you need to restore domains controllers, always try to restore the last one that was backup.

I have written a couple of articles regarding this procedure.

Userlevel 7
+6

hi @JailBreak

MS best practices advise against restoring a Domain Controller via a hot snpashot.

I only utlized the cold vmware snapshot before performing an FFL\DFL upgrade.

I strongly advise against restoring a DC from snapshot.

Snapshot for Win2k12 domain controllers (microsoft.com)

Performing Domain Controller rollback via VM snapshot or Image backup? (microsoft.com)

Userlevel 7
+4

Wondering how it worked out for you.

Don’t restore from a snapshot? So don’t use Veaam at all essentially is what you are saying?   I won’t be powering down my DC’s to backup daily jobs.

Yes, I realize powering all of them down and doing a snapshot is ideal, but not realistic in most 24/7 environments. The same reason we have multiple DC’s.

I found the ideal solution is to give one DC the FSMO roles (unless your environment is HUGE and you want to split them up. but under a few thousand users this is probably fine.)  Power off ALL of the DC’s, and restore that specific VM as authoritative.  The Veaam docs \support worked 100% for us in a crit sit with everyone down.

Userlevel 4

hi @JailBreak

MS best practices advise against restoring a Domain Controller via a hot snpashot.

I only utlized the cold vmware snapshot before performing an FFL\DFL upgrade.

I strongly advise against restoring a DC from snapshot.

Snapshot for Win2k12 domain controllers (microsoft.com)

Performing Domain Controller rollback via VM snapshot or Image backup? (microsoft.com)

Using a backup tool for DC VMs, there is no other way.  And honestly I don't know anyone that will power off any DC to backup, particularly when using Application Aware.

And has Veeam says:
“ Veeam supports Application Aware backup of Active Directory for Virtual Machine and Physical Servers. “ and ”... When possible, it’s recommended to backup the Domain Controller with most FSMO”

Like I said, I have done restores from 1 or 2 to all DCs in the environment, did not find any issue with that.

And with old Win2K12 we could have some issues or more difficult to restore, but with new 2016/2019 and the latest one, those old rules don't apply anymore for Virtual Domain Controllers. Again, as long you follow the Best Practices to perform those backups.

Userlevel 3

How did you get on @MckITGuys? If there was any particular advice that you found the most helpful, please don’t forget to mark it as the answer to help others in your position in the future

Hi all,

First off, big thanks to everyone who jumped in with advice.  Learned a lot through this exercise (not exactly my kind of exercise though :-)  I need to read back through the thread and mark a few more replies but since am short on time this morning, will just give a summary of what happened.  Note that the conclusion did NOT need to use Veeam for restores but in case someone hits this thread and wants to know the outcome, it might help them.  As well a caveat - this might not have been the “best” way to do this (i.e. manually pruning AD) but it was the first step we opted to take instead of getting into restoring DCs.

Reminder: we had one bad DC in our environment that we could not log into no-how so we could not demote it gracefully.  This happened about 3 weeks ago and we were starting to experience replication problems (changes not being replicated) so we really needed to get it fixed.  Steps taken:

-shut down the bad DC; before this fix, doing so would cause a significant problem on one of our servers that was using LDAP to this server for authentication (could not get around this) even though this server was *hardcoded* for it’s LDAP settings (could not even get support for this product to figure out why).

-go into ADUC and delete the bad DC from the Domain Controllers OU

-go into DNS and see if this DC was removed from DNS: only one instance was removed correctly (at the root) but all lower instances of it in all kinds of nodes were not removed; manually go through the forward and reverse trees and manually delete all references to the old DC; note that some might think this is extreme but we were going to permanently get rid of this DC anyhow (when we build back another DC, it will have a different name)

-go into Sites and Services (SaS) and remove the bad DC from the pertinent site; remove any connection objects between the bad DC and other DCs

-at this point I stopped and fired up the server that would always “choke” because it was looking for LDAP services from the bad server; this time, it worked fine without it - it reverted back to the other DC at that site

-went into SaS again and fired the replication connections between the remaining DCs; at first I got an error message but that eventually went away - not sure if somehow replication of the context object finally happened enough to allow full replication to happen and I just needed to wait, or did all my poking about finally get it going.  I really did not know if I had to somehow fix things up for the sites or not - but finally it worked (if anyone has any insight into that, I would not mind knowing).

-at this point, it looked as though replication was working correctly via issuing repadmin /replsummary and dcdiag etc.

-all servers stopped having errors and nslookups and all tests seemed okay; newly created objects on the one DC finally showed up on the other DCs (and obviously changes as well);

-one interesting item was that just before the fix took place last week, a few users including myself, had our mapped network drives disappear to one of our servers and we could not re-create them at all; so somehow some trust relationship or something was starting to break down such that our mapped drives stopped working (if anyone has insight on this as well, would like to hear)

I still have a few dcdiag warnings to look at this morning - but when reading a bit on each, none seemed serious (more warnings for best practices etc.); I might post those as another post if they are related to this situation.

Thanks a lot everyone for the help.  I learned a lot more about what I might have needed to do for a Veeam restore of all DCs - glad I did not have to do it.  I also feel like I need to summarize some of the thread items here to have handy in case we need it.  Hopefully I can get to that before this gets too stale in my mind.

Albert (for McKITGuys)

Userlevel 7
+6

hi @JailBreak

MS best practices advise against restoring a Domain Controller via a hot snpashot.

I only utlized the cold vmware snapshot before performing an FFL\DFL upgrade.

I strongly advise against restoring a DC from snapshot.

Snapshot for Win2k12 domain controllers (microsoft.com)

Performing Domain Controller rollback via VM snapshot or Image backup? (microsoft.com)

Using a backup tool for DC VMs, there is no other way.  And honestly I don't know anyone that will power off any DC to backup, particularly when using Application Aware.

And has Veeam says:
“ Veeam supports Application Aware backup of Active Directory for Virtual Machine and Physical Servers. “ and ”... When possible, it’s recommended to backup the Domain Controller with most FSMO”

Like I said, I have done restores from 1 or 2 to all DCs in the environment, did not find any issue with that.

And with old Win2K12 we could have some issues or more difficult to restore, but with new 2016/2019 and the latest one, those old rules don't apply anymore for Virtual Domain Controllers. Again, as long you follow the Best Practices to perform those backups.

Hi @JailBreak

Sorry I was too general, precise, don't restore a DC directly from a Vmware snapshot.

It is obvious that a backup of AD with Application Aware is consistent.

To be precise, I turned off the two DCs with FSMO roles, before the FFL\DFL upgrade. At that time there was no rollback after FFL\DFL upgrade from 2003 to 2008.

I enclose my topic post that you may not have read: :)

As mentioned above, there are some verifications before performing the restore:

- Verify that you have all DSRM passwords for each Domain Controller to be restored.

- All domain controllers with FSMO roles must be under backup.

- Perform a Virtual lab restore with clients in domain to verify that the Forest\Domain is consistent after the reset.

- Verify which sync tecnology of SYSVOL FRS or DFSR.

Backing Up and Restoring an FRS-Replicated SYSVOL Folder - Win32 apps | Microsoft Docs

- the restore cases can be two:
- Non-authoritative restore allows you to restore individual domain objects, this Veeam does easily; you also have the ability to compare the object in production with the object to be restored. in case of changes to the object.

- What is the Forest\Domain Functional level?
- Since Windows 2012 R2, it is possible to enable the "recycle bin" for fast restoring of objects without usingthird parties party software.

Authoritative Restore to completely restore the Forest\Domain

I attach info recycle bin + Veeam AD Aware.

- After authoritative restore verify that SYSVOL re replicates correctly
- Verify PDC time sync
- verify GPO functionality.
- If you have a CA verify correct operation.

the advice is to authoritative restore the server ( or the two servers) that has all FSMOs as its role.
-Then proceed with the installation and fresh promotion of new DCs.

-Also from the Windows 2012 R2 version, it is possible to clone Domain Controlles with a simple procedure.

Virtualized Domain Controller Cloning Test Guidance for Application Vendors | Microsoft Docs

Virtual Domain Controller Cloning in Windows Server 2012 - Microsoft Tech Community

Step-by-Step Guide to clone a Domain Controller - Technical Blog | REBELADMIN

Obviously by staying 3 weeks behind in addition to changing user passwords, you may also lose the computer\user accounts created during that time.
If you describe us accurately the situation and network topology of the Domain as said by @MicoolPaul  we can make a more detailed plan.
thank you

Userlevel 7
+6

@MckITGuys thanks for the feedback.
Your case is a classic one for an AD architecture.Usually when a single Domain Controller accuses a problem and you cannot restore all functionality properly, it is easier to perform a "force removal" of the corrupted DC.

Force removal guide:

- Force removal Computer Object DC from ADUC
-DELETE Computer Object

Proceed with the following checks:

- From the DNS console check & delete all entries pertaining to the removed Domain Controller.
- Edit ADSI Edit: Configuration Partition and verify that the DC object has been properly removed
Delete entry from Name Server tab
Delete the SRV records :
DNS: _msdcs.domainname.com -> CNAME (GUID - fqdn) of the DC
DNS: _msdcs.domainname.com -> dc -> _sites -> sitename -> _tcp -> SRV Records for Kerberos and LDAP
DNS: _msdcs.domainname.com -> domains -> domain guid -> _tcp -> LDAP SRV Records
DNS: _msdcs.domainname.com -> gc -> _sites and _tcp (if the DC was also GC)
DNS: _msdcs.domainname.com -> pdc -> _tcp -> LDAP SRV (if the DC was also PDC)
DNS: domainname.com -> A Record for IPv4 and IPv6

- Open Active Directory Site & Service delete in the afferent site the DC server object from the replication partners (this object is not automatically deleted.

- Verify that all AD post force removal features work correctly from replicas to FSMO role and AD time sync.

I attach a few commands for verification:

netdom query fsmo
nltest /dclist:yourdomain.local
nltest /dsgetdc:mydomain.local /force /gc

How to Force Active Directory Replication.

This will do a pull replication, which means it will pull updates from DC2 to DC1.
If you want to push replication you will use the /P switch. For example if you make changes on DC1 and want to replicate those to other DCs use this command.

DCDIAG

dcdiag /test:dns
dcdiag /test:netlogons
dfsrdiag ReplicationState /member:yourDCname

dfsrdiag ReplicationState /member:yourDCname
dcdiag /test:topology
dcdiag /test:replications
dcdiag /c /e /v >> c:\ADHealth.txt

DNSLint is a Microsoft Windows utility that helps you to diagnose common DNS name resolution issues.

Proceed to run a new DC promo to replace the DC decom
Check post DCPRMO

Get-WinEvent -LogName "Directory Service" | ?{$_.Id -eq 1119} | FL Get-WinEvent -LogName "DFS Replication" | ?{$_.Id -eq 4604} | FL

dcdiag /c /d /v

DCDiag (part of WS03 SP1 Support tools) displays all information about Domain Controller information.
dcdiag.exe /V /C /D /E /s:#DomainControllerName# > c:\dcdiag.log

NetDiag provides information about specific network configuration for the local machine.
netdiag.exe /v > c:\netnetdiag.log

repadmin.exe /showrepl dc* /verbose /all /intersite > c:\repl.txt

Regards

Userlevel 4

hi @JailBreak

MS best practices advise against restoring a Domain Controller via a hot snpashot.

I only utlized the cold vmware snapshot before performing an FFL\DFL upgrade.

I strongly advise against restoring a DC from snapshot.

Snapshot for Win2k12 domain controllers (microsoft.com)

Performing Domain Controller rollback via VM snapshot or Image backup? (microsoft.com)

Using a backup tool for DC VMs, there is no other way.  And honestly I don't know anyone that will power off any DC to backup, particularly when using Application Aware.

And has Veeam says:
“ Veeam supports Application Aware backup of Active Directory for Virtual Machine and Physical Servers. “ and ”... When possible, it’s recommended to backup the Domain Controller with most FSMO”

Like I said, I have done restores from 1 or 2 to all DCs in the environment, did not find any issue with that.

And with old Win2K12 we could have some issues or more difficult to restore, but with new 2016/2019 and the latest one, those old rules don't apply anymore for Virtual Domain Controllers. Again, as long you follow the Best Practices to perform those backups.

Hi @JailBreak

Sorry I was too general, precise, don't restore a DC directly from a Vmware snapshot.

It is obvious that a backup of AD with Application Aware is consistent.

To be precise, I turned off the two DCs with FSMO roles, before the FFL\DFL upgrade. At that time there was no rollback after FFL\DFL upgrade from 2003 to 2008.

I enclose my topic post that you may not have read: :)

As mentioned above, there are some verifications before performing the restore:

- Verify that you have all DSRM passwords for each Domain Controller to be restored.

- All domain controllers with FSMO roles must be under backup.

- Perform a Virtual lab restore with clients in domain to verify that the Forest\Domain is consistent after the reset.

- Verify which sync tecnology of SYSVOL FRS or DFSR.

Backing Up and Restoring an FRS-Replicated SYSVOL Folder - Win32 apps | Microsoft Docs

- the restore cases can be two:
- Non-authoritative restore allows you to restore individual domain objects, this Veeam does easily; you also have the ability to compare the object in production with the object to be restored. in case of changes to the object.

- What is the Forest\Domain Functional level?
- Since Windows 2012 R2, it is possible to enable the "recycle bin" for fast restoring of objects without usingthird parties party software.

Authoritative Restore to completely restore the Forest\Domain

I attach info recycle bin + Veeam AD Aware.

- After authoritative restore verify that SYSVOL re replicates correctly
- Verify PDC time sync
- verify GPO functionality.
- If you have a CA verify correct operation.

the advice is to authoritative restore the server ( or the two servers) that has all FSMOs as its role.
-Then proceed with the installation and fresh promotion of new DCs.

-Also from the Windows 2012 R2 version, it is possible to clone Domain Controlles with a simple procedure.

Virtualized Domain Controller Cloning Test Guidance for Application Vendors | Microsoft Docs

Virtual Domain Controller Cloning in Windows Server 2012 - Microsoft Tech Community

Step-by-Step Guide to clone a Domain Controller - Technical Blog | REBELADMIN

Obviously by staying 3 weeks behind in addition to changing user passwords, you may also lose the computer\user accounts created during that time.
If you describe us accurately the situation and network topology of the Domain as said by @MicoolPaul  we can make a more detailed plan.
thank you

In the list that I share in my first comment, is about VM-Generation – ID Change, not backups. It where the ID is change every time there is a change on the VM and with this DC knows there was a change on the system.

So every time there is one of those actions, VM-Generation – ID Change changes.

Honestly there is on my list to write an update blog post about Backup and Restore DCs using the latest Windows versions. But as many others that are on my list, still is in the wishing list :)

Userlevel 4

Interesting thread, We have the issue that after starting our DR site (which is built using a Veeam replication job from backup copies), the sysvol share is not replicating anymore (we have 3 DC’s). I can only get it to replicate again by adding the following regkeys to the DC which holds the FSMO roles:

reg add "HKLM\System\CurrentControlSet\Services\DFSR\Restore" /v SYSVOL /t REG_SZ /d "authoritative"
reg add "HKLM\SYSTEM\CurrentControlSet\Control\BackupRestore\SystemStateRestore" /v LastRestoreId /t REG_SZ /d "10000000-0000-0000-0000-000000000000"

After a reboot of the DC I added these keys to sysvol is replicating again.

Is this also a known issue?

Userlevel 7
+6

Interesting thread, We have the issue that after starting our DR site (which is built using a Veeam replication job from backup copies), the sysvol share is not replicating anymore (we have 3 DC’s). I can only get it to replicate again by adding the following regkeys to the DC which holds the FSMO roles:

reg add "HKLM\System\CurrentControlSet\Services\DFSR\Restore" /v SYSVOL /t REG_SZ /d "authoritative"
reg add "HKLM\SYSTEM\CurrentControlSet\Control\BackupRestore\SystemStateRestore" /v LastRestoreId /t REG_SZ /d "10000000-0000-0000-0000-000000000000"

After a reboot of the DC I added these keys to sysvol is replicating again.

Is this also a known issue?

yes, you performed an Authoritative SYSVOL restore.

Userlevel 7
+4

I feel like there needs to be a MUCH more clear way of setting this up and preforming it. From Veeam and Microsoft both.   I’m talking fail safe step by step hold my hand instructions.

I bet a lot of people are not set up for successes in the event they need to restore their DC’s.

Userlevel 3

Sorry for the super-long delay; after successfully doing the work on the server, I left a few days later for two weeks holidays (not that I like to do that - I would rather be around for a bit but they were booked and everything seemed okay!).  In short, was able to clean up AD for the “bad DC” instead of having to restore from Veeam.  Even though the below is not a “veaam” solution, thought a post-op summary would be informative to anyone reading this thread - and I still had a few questions re what happened - more for my info than anything.  I also still have some specific questions re being better prepared in case next time I have to restore from Veeam.  But for now, here is a log of what I did to do the cleanup.

>>>>>>>>>>>>>>>>>>>>>>>>>»

Reminder: we had one bad DC in our environment that we could not log into in any way so we could not demote it gracefully.  This happened about 3 weeks ago and we were starting to experience replication problems (changes not being replicated between DCs) so we really needed to get it fixed.

Steps taken:

-shut down the bad DC; before this fix, doing so would cause a significant problem on one of our servers that was using LDAP to this server for authentication (could not get around this) even though this server was *hardcoded* to use LDAP on another DC (not the “bad” DC) (could not even get support for this product to figure out why LDAP not pointing to the correct server).

-go into ADUC (on the other DC at the same site) and delete the bad DC from the Domain Controllers OU

-go into DNS and see if bad DC was removed from DNS; result: only one instance was removed correctly (at the root) but all lower instances of it in all kinds of nodes were not removed; manually go through the forward and reverse trees and manually deleted all references to the bad DC; note that some might think this is extreme but we were going to permanently get rid of this DC anyhow (when we build back another DC, it will have a different name)

-go into Sites and Services (SaS) and remove the bad DC from the pertinent site; remove any connection objects between the bad DC and other DCs

-at this point I stopped and fired up the server that would always “choke” because it was looking for LDAP services from the bad DC; this time though, it worked fine without it - it reverted back to getting LDAP from the other DC at that site

-went into SaS again and fired the replication connections between the remaining DCs; at first I got an error message (sorry, did not record) but that eventually went away - not sure if somehow replication of the connection object finally happened enough to allow full replication to happen and I just needed to wait, or did all my poking about finally get it going.  I really did not know if I had to somehow fix things up for the sites or not - but finally it worked (if anyone has any insight into that, I would not mind knowing).

-at this point, replication seemed to be working correctly - tested by issuing repadmin /replsummary and dcdiag etc.

-all servers stopped having errors and nslookups and all tests seemed okay; newly created objects on the one DC finally showed up on the other DCs (and obviously changes as well);

-one interesting item was that just before the fix took place last week, a few users including myself, had our mapped network drives disappear to one of our servers and we could not re-create them at all; so somehow some trust relationship or something was starting to break down such that our mapped drives stopped working (if anyone has insight on this as well, would like to hear)

I still have a few dcdiag warnings to look at this morning - but when reading a bit on each, none seemed serious (more warnings for best practices etc.); I might post those as another post if they are related to this situation.

>>>>>>>>>>>>>>>>>>>>>

Thanks a lot everyone for the help.  I learned a lot more about what I might have needed to do for a Veeam restore of all DCs - glad I did not have to do it.  I also feel like I need to summarize some of the thread items here to have handy in case we need it.  Hopefully I can get to that before this gets too stale in my mind.

Albert (for McKITGuys)