Solved

Restoring all domain controllers in a Windows domain (on vmware)


Userlevel 3

Hi all.  There is the possibility that this weekend I might need to restore all of our domain controllers in our domain (hopefully this does not come to pass).  It is not a very active AD domain so hopefully but I might have to go back about 3 weeks to get around a problem.

Can someone point me to documentation as to how to do this from within Veeam with all the DCs down?  This would mean that no dns resolution will be happening (do I have to create a hosts file on the Veeam server to get around this?).

Another hiccup is that for one of the DCs, I notice now that application aware processing has not been turned on so it is going to be restored as just a member server without any special processing for the AD database.

Thanks.

icon

Best answer by regnor 31 August 2022, 07:53

View original

41 comments

Userlevel 7
Badge +14

If application aware processing is enabled, a domaincontroller gets restored in non-authoritative mode. For a faster/better restore, the first DC should be placed in authoritative mode, all others in non-authoritative. For the one DC which has AAP disabled, I would manually go into recovery and non-authoritative mode as otherwise this one could cause replication problems.

The following article describes all the details: https://www.veeam.com/kb2119

 

Please keep in mind, that restoring a 3 week old backup could result in some problems, like changed passwords (user/computer).

Userlevel 7
Badge +9

Hi @MckITGuys 

Hi,
Can someone point me to documentation as to how to do this from within Veeam with all the DCs down?  This would mean that no dns resolution will be happening (do I have to create a hosts file on the Veeam server to get around this?).

- you can compile an hosts file to put on all veeam servers.
- Or added the IPs of the vcsa\esxi etc to the veeam B&R Console.

Another hiccup is that for one of the DCs, I notice now that application aware processing has not been turned on so it is going to be restored as just a member server without any special processing for the AD database.

I have a trick, if you have not flagged AA for AD you can open the backup with Explorer FRL restore, from here you can also open application restores even if you have not enabled AA, this applies to granular restore of AD objects.

As mentioned above, there are some verifications before performing the restore:

https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/ad-forest-recovery-determine-how-to-recover 

- Verify that you have all DSRM passwords for each Domain Controller to be restored.

- I attach procedure for resetting DSRM passwords. https://www.dell.com/support/kbdoc/it-it/000136611/resetting-the-directory-services-restore-mode-administrator-password 

- All domain controllers with FSMO roles must be under backup.

- Perform a Virtual lab restore with clients in domain to verify that the Forest\Domain is consistent after the reset.

- Verify which sync tecnology of SYSVOL FRS or DFSR.

Backing Up and Restoring an FRS-Replicated SYSVOL Folder - Win32 apps | Microsoft Docs

- the restore cases can be two:
  - Non-authoritative restore allows you to restore individual domain objects, this Veeam does easily; you also have the ability to compare the object in production with the object to be restored. in case of changes to the object.

- What is the Forest\Domain Functional level?
- Since Windows 2012 R2, it is possible to enable the "recycle bin" for fast restoring of objects without usingthird parties party software.

Authoritative Restore to completely restore the Forest\Domain

https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/ad-forest-recovery-perform-initial-recovery 

I attach info recycle bin + Veeam AD Aware.
https://forums.veeam.com/veeam-backup-replication-f2/veeam-explorer-for-ad-and-ad-recycle-bin-enable-t29703.html  

- After authoritative restore verify that SYSVOL re replicates correctly
- Verify PDC time sync
- verify GPO functionality.
- If you have a CA verify correct operation.

the advice is to authoritative restore the server ( or the two servers) that has all FSMOs as its role.
-Then proceed with the installation and fresh promotion of new DCs.

-Also from the Windows 2012 R2 version, it is possible to clone Domain Controlles with a simple procedure.

Virtualized Domain Controller Cloning Test Guidance for Application Vendors | Microsoft Docs

Virtual Domain Controller Cloning in Windows Server 2012 - Microsoft Tech Community

Step-by-Step Guide to clone a Domain Controller - Technical Blog | REBELADMIN

Obviously by staying 3 weeks behind in addition to changing user passwords, you may also lose the computer\user accounts created during that time.
If you describe us accurately the situation and network topology of the Domain as said by @MicoolPaul  we can make a more detailed plan.
thank you

 

Userlevel 7
Badge +11

Recently I restored a single AD on my homelab without any problems.

However, it was not a production environment like yours.

 

Take a look on this article:

https://www.veeam.com/blog/active-directory-domain-controller-backup-recovery.html

 

Userlevel 7
Badge +13

Please keep in mind, that restoring a 3 week old backup could result in some problems, like changed passwords (user/computer).

Other than that, it could result in a more serious problem: devices in the network may lose the trust relationship with that domain.

Userlevel 7
Badge +8

I had to do it once; after a ransomware hit.

we restored the backup of the primary dc, it was only 12 hours before (old) backup, but we faced some issues in trust and Kerberos authentication, but just making the gpupdate / force in most pcs with the problem it was fine, just one of them (they were 200 clients) needed to be kicked out of the domain and adopted back in.

after restoration ok, we deployed a new secondary domain controller, not restoring the backup, to avoid authoritative or sync issues.

hopefully helps you.

cheers.

Userlevel 7
Badge +20

Morning!

 

Veeam would be terrible backup software if you couldn’t recover your AD environment, so the answer is a definite YES!

 

Lets go over a few basic parts to this and build a strategy for you:

 

Firstly, you’ve quite rightly highlighted your DNS will be impacted. We need to consider the impact from a few perspectives:

  • vSphere: Does vCenter manage the ESXi hosts via FQDN entries or IP addresses? If it’s via FQDN, you’ll be best avoiding using vCenter within Veeam for your recovery as you’d have to mess with your VCSA to avoid the DNS issues. We can just target an ESXi host via IP address within Veeam
  • Veeam: Veeam needs to be able to talk to its components, is your Veeam server an “all in one” box? If not, are your components/servers being targeted via FQDN or DNS? If FQDN I would just edit the hosts file to ensure this continues to work.
  • AD: What DNS configuration are you using here? Are all the domain controllers aiming at themselves (relying on DNS replication) within their DNS client/network config, are they all pointing to a central DNS server, or are they all aiming at each other. This could impact how you want to recover.

Secondly, what OS are your domain controllers? Windows Server 2012 R2 added some virtualisation safeguards for AD to help if you restored a DC out of sequence.

 

Thirdly, check your retention policy, make sure the backups you’ll need aren’t going to be deleted due to retention this week, if in doubt you could also export the backups you need for an extra layer of confidence.
 

Finally, are your DCs backed up at the same time or at different times? I’d want to use the latest backup available as my primary and reattach my older DC backups to that.

 

Once we’ve got some information here to play with, we can put together a more tailored plan.

 

I’d also suggest using virtual labs to test this in an isolated network prior to carrying out in production, so you can be confident you’ve hit any & all snags prior to doing it “for real”, and as you can do this with the same backups you’ll actually want to recover from (as you said you could be going back a few weeks), so it will be an identical process.

Userlevel 7
Badge +20

You can follow along this blog post by Veeam - Recovering the Active Directory Domain Services - Best practices for AD administration (part 3) (veeam.com)

Never had to recover AD myself but with Veeam it should be manageable.

Also here is how to restore items only - Restore AD Items

Userlevel 3

Those articles look great for restoring a single object - but I might have to restore the entire AD database (for reasons too deep to go into here).  I wonder if I just restore the entire vm image?

Also, I have another potential problem: one of the 3 DCs did not have application aware processing turned on so it does not even show up in the restore wizard - turning that on tonight!  I think there was initially problems with that DC and somehow then it never got fixed or at least turned on.

Albert

Userlevel 7
Badge +13

I had to do it once; after a ransomware hit.

we restored the backup of the primary dc, it was only 12 hours before (old) backup, but we faced some issues in trust and Kerberos authentication, but just making the gpupdate / force in most pcs with the problem it was fine, just one of them (they were 200 clients) needed to be kicked out of the domain and adopted back in.

after restoration ok, we deployed a new secondary domain controller, not restoring the backup, to avoid authoritative or sync issues.

hopefully helps you.

cheers.

Exactly. If it’s restored before tgt release the trust could be broken.
Few days may be ok, 3 weeks are a lot.

But if all DCs are losts, there’s no other way than try...

Userlevel 7
Badge +8

Hi all.  There is the possibility that this weekend I might need to restore all of our domain controllers in our domain (hopefully this does not come to pass).  It is not a very active AD domain so hopefully but I might have to go back about 3 weeks to get around a problem.

Can someone point me to documentation as to how to do this from within Veeam with all the DCs down?  This would mean that no dns resolution will be happening (do I have to create a hosts file on the Veeam server to get around this?).

Another hiccup is that for one of the DCs, I notice now that application aware processing has not been turned on so it is going to be restored as just a member server without any special processing for the AD database.

Thanks.

I did this recently mid day in a HUGE environment after someone blew it up. 100’s of VM’s and file servers were not accessible and critical infrastructure was all down.(stressful)

 

  1. YES make hosts files. Have EVERY Veeam server or other server you will need in it, Proxies, storage, Veeam server, SQL server, ESXI host, vCenter, file shares etc. I now have an updated copy in multiple spots if it happens again. Also keep a list of all your Veeam IP’s on hand so if you ever lose DNS.
  2. Consider separating your DC’s and DNS servers. They don’t need to stay together. this makes life much easier going forward.  (Still keep a host file available if you need it)
  3. Call Veeam support for help if you need. 
  1. The easiest way for me would be to power down ALL of the DC’s and restore the master as authoritative. I’d then restore the other/rest and let them sync. Having the roles on on DC going forward lets you know that that is your master and you should always use it for restores. 
  2. There is a procedure for this, Call Veeam and see what they say if your app aware processing doesn’t work.  You will be ok, but i don’t know if the procedure changes. 

 

 

Userlevel 7
Badge +20

Those articles look great for restoring a single object - but I might have to restore the entire AD database (for reasons too deep to go into here).  I wonder if I just restore the entire vm image?

Also, I have another potential problem: one of the 3 DCs did not have application aware processing turned on so it does not even show up in the restore wizard - turning that on tonight!  I think there was initially problems with that DC and somehow then it never got fixed or at least turned on.

Albert

If the main DCs have application aware on restoring the VM would be ideal and then you will need to check AD with the tools required within windows.

Userlevel 7
Badge +6

Please keep in mind, that restoring a 3 week old backup could result in some problems, like changed passwords (user/computer).

Other than that, it could result in a more serious problem: devices in the network may lose the trust relationship with that domain.

 

Trust relationship failures are generally easy to resolve, but that said, going 3 weeks back, I’m betting, depending on the number of workstations (and servers), there may be a lot of trust relationship failures.  Better be prepped with one of these two commands.  There is also a way to issue this command remotely, but I’d be sure that you know the local admin password on everything first.  Best of luck with you on your Authorative restore.  I don’t think I’ve ever had to do one authoratively….just non-authorative restores.

 

Command Line:

netdom resetpwd /s:Domain-Controller /ud:domain administrator /pd:*

 

PowerShell

Reset-ComputerMachinePassword -Server "EU-S01" -Credential Domain01\ShellAdmin

 

https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.management/reset-computermachinepassword?view=powershell-5.1

https://shellgeek.com/reset-computermachinepassword-in-powershell/

Userlevel 3

hopefully Veeam support is faster than the usual response - but maybe if this is severity 1 I will get someone right away - going to try this on a long weekend starting Friday

-so in short, I power down all 3 DCs, restore the one which I think has the best data, issue a command at the command line or a registry setting to make this the authoratative server, then restore the other 2 servers right?

Scott wrote:

  1. Call Veeeam support for help if you need
  2. The easiest way for me would be to power down ALL of the DC’s and restore the master as authoritative. I’d then restore the other/rest and let them sync.
Userlevel 7
Badge +6

More questions from excellent discussion above:

-netdom resetpwd - so I have a server or user PC that has lost its trust relationship: am I going to be able to log into that server to perform this command?  and if I do, say my server is named DC1 and my domain admin account is SkinnyAdmin, would the command be:

netdom resetpwd /s:DC1 /ud:mydomain.com\SkinnyAdmin /pd:bigFatPassword

-so I assume this from what I have read, since this is a machine password, gets a new generated password from the DC and updates that locally and also in that computer’s object in AD? just checking

-and I assume that this resets the trust relationship at the same time

DSRM password: I have taken over admin of a client’s  network: although I have the administrator passwords, I have not found any DSRM passwords recorded anywhere; I can guess by a list of commonly used passwords but cannot be sure; is that going to prevent a Veeam restore or does it just use the domain admin passwords stored in its credentials setup?  I know someone pasted a link to resetting the DSRM pwd but I would assume that only applies going forward, not going backwards when I do not have access to it.

 

You have to log into the machine with a local password.  You MIGHT be able to log into the machine with a cached domain password if the machine is disconnected from the network.  Then connect to the network so that it can contact the DC.  It then authenticates to the DC with your admin credentials and creates and syncs a new machine password with AD.  The Netdom and Reset-ComputerMachinePassword PowerShell cmdlet performs the same basic functions here.  Once both the workstation and AD are in sync with the passwork, the trust relationship is validated.  I’m not sure where the password is generated...I’ve always assumed that the workstation generates the password and tell’s AD what it is using your AD creds. 

Note that this is going to be for member servers and workstations.  DC’s don’t have a local admin password and should always be able to communicate with the copy of AD it’s hosting on itself.  If a DC becomes disconnected from the rest of AD due to sync issues (firewall, VPN, etc), and it’s been too long, it can become tombstoned and can no longer safely synch with AD. 

As for your command syntax, you can type in the /pd:bigFatPassword if you need to script things.  That said, to type it realtime (prompted) remove the bigFatPassword and replace it with *.  Using /pd:* will prompt you for the password so that you don’t have a domain admin password sitting on a script or note or whatever somewhere.  I don’t believe you have to use the /s:DC1 to specify a server name…..omitting it should cause it to connect to any DC it can talk to.

 

I’ve never reset a DSRM password….I imagine it can be done but I’ve never checked.  Of course, if that was the case, I assume you’d need to have valid domain access to the DC to do so.

Userlevel 7
Badge +6

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Yeah, I shudder at the thought of restoring all DC’s.  I’d honestly consider shutting down all of the DC’s, restoring one authoratively, and then cleaning up AD and building new DC’s for any remote systems.  Building a DC is practically throw-away anymore, especially since you don’t have to do metadata cleanups anymore.

Userlevel 7
Badge +8

The restore of all 4 of our DC’s  by useing one as authoritative, then restoring the other 3 went super smooth when we used  Veeam support to assist us.

 

The things that took the longest were about 2 hours fixing DNS problems we had. Granted our DC’s had a crazy issue and all crashed. Now DNS is separate, but those host files would have been nice BEFORE hand.  Get your DNS working before and you are already ahead of the game for DR planning.

 

The restore took longer than expected for us as well. I can’t remember if it was something with networking, storage, or VMware but it took about an hour to restore the DC and I was expecting minutes. 

 

All in all it went great.  You also have the benefit of powering them all down, doing a snapshot or backup BEFORE you start while everything is in a consistent state so it’s pretty stress free.

 

 

Userlevel 3

Sorry for the super-long delay; after successfully doing the work on the server, I left a few days later for two weeks holidays (not that I like to do that - I would rather be around for a bit but they were booked and everything seemed okay!).  In short, was able to clean up AD for the “bad DC” instead of having to restore from Veeam.  Even though the below is not a “veaam” solution, thought a post-op summary would be informative to anyone reading this thread - and I still had a few questions re what happened - more for my info than anything.  I also still have some specific questions re being better prepared in case next time I have to restore from Veeam.  But for now, here is a log of what I did to do the cleanup.

>>>>>>>>>>>>>>>>>>>>>>>>>»


Reminder: we had one bad DC in our environment that we could not log into in any way so we could not demote it gracefully.  This happened about 3 weeks ago and we were starting to experience replication problems (changes not being replicated between DCs) so we really needed to get it fixed. 

Steps taken:

-shut down the bad DC; before this fix, doing so would cause a significant problem on one of our servers that was using LDAP to this server for authentication (could not get around this) even though this server was *hardcoded* to use LDAP on another DC (not the “bad” DC) (could not even get support for this product to figure out why LDAP not pointing to the correct server).

-go into ADUC (on the other DC at the same site) and delete the bad DC from the Domain Controllers OU

-go into DNS and see if bad DC was removed from DNS; result: only one instance was removed correctly (at the root) but all lower instances of it in all kinds of nodes were not removed; manually go through the forward and reverse trees and manually deleted all references to the bad DC; note that some might think this is extreme but we were going to permanently get rid of this DC anyhow (when we build back another DC, it will have a different name)

-go into Sites and Services (SaS) and remove the bad DC from the pertinent site; remove any connection objects between the bad DC and other DCs

-at this point I stopped and fired up the server that would always “choke” because it was looking for LDAP services from the bad DC; this time though, it worked fine without it - it reverted back to getting LDAP from the other DC at that site

-went into SaS again and fired the replication connections between the remaining DCs; at first I got an error message (sorry, did not record) but that eventually went away - not sure if somehow replication of the connection object finally happened enough to allow full replication to happen and I just needed to wait, or did all my poking about finally get it going.  I really did not know if I had to somehow fix things up for the sites or not - but finally it worked (if anyone has any insight into that, I would not mind knowing).

-at this point, replication seemed to be working correctly - tested by issuing repadmin /replsummary and dcdiag etc.

-all servers stopped having errors and nslookups and all tests seemed okay; newly created objects on the one DC finally showed up on the other DCs (and obviously changes as well);

-one interesting item was that just before the fix took place last week, a few users including myself, had our mapped network drives disappear to one of our servers and we could not re-create them at all; so somehow some trust relationship or something was starting to break down such that our mapped drives stopped working (if anyone has insight on this as well, would like to hear)

I still have a few dcdiag warnings to look at this morning - but when reading a bit on each, none seemed serious (more warnings for best practices etc.); I might post those as another post if they are related to this situation.

>>>>>>>>>>>>>>>>>>>>>

Thanks a lot everyone for the help.  I learned a lot more about what I might have needed to do for a Veeam restore of all DCs - glad I did not have to do it.  I also feel like I need to summarize some of the thread items here to have handy in case we need it.  Hopefully I can get to that before this gets too stale in my mind.

Albert (for McKITGuys)

Userlevel 7
Badge +8

I feel like there needs to be a MUCH more clear way of setting this up and preforming it. From Veeam and Microsoft both.   I’m talking fail safe step by step hold my hand instructions.

 

I bet a lot of people are not set up for successes in the event they need to restore their DC’s. 

Userlevel 3

Trust relationship failures are generally easy to resolve, but that said, going 3 weeks back, I’m betting, depending on the number of workstations (and servers), there may be a lot of trust relationship failures.

 

In a smaller environment (35 workstations), with not a lot of new activity, what causes a trust relationship to get lost - are they renewed every few days or something?  Or would the only ones lost be those of any new PCs added to the domain?

Userlevel 7
Badge +12

Machine account changes their passwords automatically every 30 days.

https://docs.microsoft.com/en-us/windows/security/threat-protection/security-policy-settings/domain-member-maximum-machine-account-password-age

 

If a computer changed his machine password in the last two weeks, and you restore the AD to a date two weeks ago, this computer must be removed and rejoined to the active directory.

Userlevel 3

More questions from excellent discussion above:

-netdom resetpwd - so I have a server or user PC that has lost its trust relationship: am I going to be able to log into that server to perform this command?  and if I do, say my server is named DC1 and my domain admin account is SkinnyAdmin, would the command be:

netdom resetpwd /s:DC1 /ud:mydomain.com\SkinnyAdmin /pd:bigFatPassword

-so I assume this from what I have read, since this is a machine password, gets a new generated password from the DC and updates that locally and also in that computer’s object in AD? just checking

-and I assume that this resets the trust relationship at the same time

DSRM password: I have taken over admin of a client’s  network: although I have the administrator passwords, I have not found any DSRM passwords recorded anywhere; I can guess by a list of commonly used passwords but cannot be sure; is that going to prevent a Veeam restore or does it just use the domain admin passwords stored in its credentials setup?  I know someone pasted a link to resetting the DSRM pwd but I would assume that only applies going forward, not going backwards when I do not have access to it.

Userlevel 7
Badge +13

I’ve never reset a DSRM password….I imagine it can be done but I’ve never checked. 

https://www.dell.com/support/kbdoc/it-it/000136611/resetting-the-directory-services-restore-mode-administrator-password

Userlevel 7
Badge +7

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Userlevel 5
Badge +1

I’m restoring AD not because it’s broken, because I need to test something but not in Production.  We have a single forest with two domains. Domain one, has two DCs onsite and one at a remote office on a different network.  Domain two, has two DCs, the “main” one at a remote office on a different network and one onsite. I have restored all five DCs but not yet to turn them on. In each domain’s case, I power on the DCs which hold the roles, put them in authorative mode, then power on the others (will probably have to do some creative stuff to get the remote DCs to talk to everything else)?

 

What are your takes here?

Userlevel 3

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Yeah, I shudder at the thought of restoring all DC’s.  I’d honestly consider shutting down all of the DC’s, restoring one authoratively, and then cleaning up AD and building new DC’s for any remote systems.  Building a DC is practically throw-away anymore, especially since you don’t have to do metadata cleanups anymore.

I have thought of that too - just restoring one DC, cleaning it up and then building others from that.  The DC with most likely the latest changes on it (password changes, trust relationships) though does not hold the FSMO roles.  So if I started with this, I would have to seize all the FSMO roles.

If I start with the server with the FSMO roles, it is at their head office but because most of their users log into VDI desktops at their data center, most likely there will be more broken trust relationships.

Anyhow, hopefully it does not come to this.  Here is my plan (for what it’s worth); I started this thread to get an understanding about recovering via Veeam backups but I am actually just going to try to get AD working again across sites.  As a reminder, the problem is there was a new DC being built that has become corrupted (not accessible) and the computer object in AD looks “messed up”.  No way to get into the DC to demote it and clean up.  But it is running and if I down it, one of our servers stops working.  I won’t go into more details but suffice to say, I need to get that DC out of the domain, clean up AD and then promote a new one.  So here is my plan that if all goes well, there will be no restore from backup:

  • down the problem DC
  • delete all the references to it in AD manually
  • AD synching seems to be somewhat stopped - repadmin says the DCs can connect but replication stopped due to errors
  • issue commands to “force” replication between the good DCs
  • hopefully all is well

Question: having said the above, if I just do the above, will the bad DC object just come back from one of the other DCs?  Do I need to delete it on all DCs manually before forcing the replication?  or should I maybe down all DCs except one, make the deletion, set that DC as authoratative (via the registry setting I have read about), reboot that DC (I assume I need a reboot for the setting to take affect) and then bring up the other 2 DCs and force a replicate?  comments?

Comment