Solved

Restoring all domain controllers in a Windows domain (on vmware)


Userlevel 3

Hi all.  There is the possibility that this weekend I might need to restore all of our domain controllers in our domain (hopefully this does not come to pass).  It is not a very active AD domain so hopefully but I might have to go back about 3 weeks to get around a problem.

Can someone point me to documentation as to how to do this from within Veeam with all the DCs down?  This would mean that no dns resolution will be happening (do I have to create a hosts file on the Veeam server to get around this?).

Another hiccup is that for one of the DCs, I notice now that application aware processing has not been turned on so it is going to be restored as just a member server without any special processing for the AD database.

Thanks.

icon

Best answer by regnor 31 August 2022, 07:53

View original

39 comments

Userlevel 7
Badge +6

Recently I restored a single AD on my homelab without any problems.

However, it was not a production environment like yours.

 

Take a look on this article:

https://www.veeam.com/blog/active-directory-domain-controller-backup-recovery.html

 

Userlevel 7
Badge +8

You can follow along this blog post by Veeam - Recovering the Active Directory Domain Services - Best practices for AD administration (part 3) (veeam.com)

Never had to recover AD myself but with Veeam it should be manageable.

Also here is how to restore items only - Restore AD Items

Userlevel 3

Those articles look great for restoring a single object - but I might have to restore the entire AD database (for reasons too deep to go into here).  I wonder if I just restore the entire vm image?

Also, I have another potential problem: one of the 3 DCs did not have application aware processing turned on so it does not even show up in the restore wizard - turning that on tonight!  I think there was initially problems with that DC and somehow then it never got fixed or at least turned on.

Albert

Userlevel 7
Badge +8

Those articles look great for restoring a single object - but I might have to restore the entire AD database (for reasons too deep to go into here).  I wonder if I just restore the entire vm image?

Also, I have another potential problem: one of the 3 DCs did not have application aware processing turned on so it does not even show up in the restore wizard - turning that on tonight!  I think there was initially problems with that DC and somehow then it never got fixed or at least turned on.

Albert

If the main DCs have application aware on restoring the VM would be ideal and then you will need to check AD with the tools required within windows.

Userlevel 7
Badge +6

If application aware processing is enabled, a domaincontroller gets restored in non-authoritative mode. For a faster/better restore, the first DC should be placed in authoritative mode, all others in non-authoritative. For the one DC which has AAP disabled, I would manually go into recovery and non-authoritative mode as otherwise this one could cause replication problems.

The following article describes all the details: https://www.veeam.com/kb2119

 

Please keep in mind, that restoring a 3 week old backup could result in some problems, like changed passwords (user/computer).

Userlevel 7
Badge +6

Please keep in mind, that restoring a 3 week old backup could result in some problems, like changed passwords (user/computer).

Other than that, it could result in a more serious problem: devices in the network may lose the trust relationship with that domain.

Userlevel 7
Badge +2

I had to do it once; after a ransomware hit.

we restored the backup of the primary dc, it was only 12 hours before (old) backup, but we faced some issues in trust and Kerberos authentication, but just making the gpupdate / force in most pcs with the problem it was fine, just one of them (they were 200 clients) needed to be kicked out of the domain and adopted back in.

after restoration ok, we deployed a new secondary domain controller, not restoring the backup, to avoid authoritative or sync issues.

hopefully helps you.

cheers.

Userlevel 7
Badge +6

I had to do it once; after a ransomware hit.

we restored the backup of the primary dc, it was only 12 hours before (old) backup, but we faced some issues in trust and Kerberos authentication, but just making the gpupdate / force in most pcs with the problem it was fine, just one of them (they were 200 clients) needed to be kicked out of the domain and adopted back in.

after restoration ok, we deployed a new secondary domain controller, not restoring the backup, to avoid authoritative or sync issues.

hopefully helps you.

cheers.

Exactly. If it’s restored before tgt release the trust could be broken.
Few days may be ok, 3 weeks are a lot.

But if all DCs are losts, there’s no other way than try...

Userlevel 7
Badge +8

Morning!

 

Veeam would be terrible backup software if you couldn’t recover your AD environment, so the answer is a definite YES!

 

Lets go over a few basic parts to this and build a strategy for you:

 

Firstly, you’ve quite rightly highlighted your DNS will be impacted. We need to consider the impact from a few perspectives:

  • vSphere: Does vCenter manage the ESXi hosts via FQDN entries or IP addresses? If it’s via FQDN, you’ll be best avoiding using vCenter within Veeam for your recovery as you’d have to mess with your VCSA to avoid the DNS issues. We can just target an ESXi host via IP address within Veeam
  • Veeam: Veeam needs to be able to talk to its components, is your Veeam server an “all in one” box? If not, are your components/servers being targeted via FQDN or DNS? If FQDN I would just edit the hosts file to ensure this continues to work.
  • AD: What DNS configuration are you using here? Are all the domain controllers aiming at themselves (relying on DNS replication) within their DNS client/network config, are they all pointing to a central DNS server, or are they all aiming at each other. This could impact how you want to recover.

Secondly, what OS are your domain controllers? Windows Server 2012 R2 added some virtualisation safeguards for AD to help if you restored a DC out of sequence.

 

Thirdly, check your retention policy, make sure the backups you’ll need aren’t going to be deleted due to retention this week, if in doubt you could also export the backups you need for an extra layer of confidence.
 

Finally, are your DCs backed up at the same time or at different times? I’d want to use the latest backup available as my primary and reattach my older DC backups to that.

 

Once we’ve got some information here to play with, we can put together a more tailored plan.

 

I’d also suggest using virtual labs to test this in an isolated network prior to carrying out in production, so you can be confident you’ve hit any & all snags prior to doing it “for real”, and as you can do this with the same backups you’ll actually want to recover from (as you said you could be going back a few weeks), so it will be an identical process.

Userlevel 7
Badge +4

Hi @MckITGuys 

Hi,
Can someone point me to documentation as to how to do this from within Veeam with all the DCs down?  This would mean that no dns resolution will be happening (do I have to create a hosts file on the Veeam server to get around this?).

- you can compile an hosts file to put on all veeam servers.
- Or added the IPs of the vcsa\esxi etc to the veeam B&R Console.

Another hiccup is that for one of the DCs, I notice now that application aware processing has not been turned on so it is going to be restored as just a member server without any special processing for the AD database.

I have a trick, if you have not flagged AA for AD you can open the backup with Explorer FRL restore, from here you can also open application restores even if you have not enabled AA, this applies to granular restore of AD objects.

As mentioned above, there are some verifications before performing the restore:

https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/ad-forest-recovery-determine-how-to-recover 

- Verify that you have all DSRM passwords for each Domain Controller to be restored.

- I attach procedure for resetting DSRM passwords. https://www.dell.com/support/kbdoc/it-it/000136611/resetting-the-directory-services-restore-mode-administrator-password 

- All domain controllers with FSMO roles must be under backup.

- Perform a Virtual lab restore with clients in domain to verify that the Forest\Domain is consistent after the reset.

- Verify which sync tecnology of SYSVOL FRS or DFSR.

Backing Up and Restoring an FRS-Replicated SYSVOL Folder - Win32 apps | Microsoft Docs

- the restore cases can be two:
  - Non-authoritative restore allows you to restore individual domain objects, this Veeam does easily; you also have the ability to compare the object in production with the object to be restored. in case of changes to the object.

- What is the Forest\Domain Functional level?
- Since Windows 2012 R2, it is possible to enable the "recycle bin" for fast restoring of objects without usingthird parties party software.

Authoritative Restore to completely restore the Forest\Domain

https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/ad-forest-recovery-perform-initial-recovery 

I attach info recycle bin + Veeam AD Aware.
https://forums.veeam.com/veeam-backup-replication-f2/veeam-explorer-for-ad-and-ad-recycle-bin-enable-t29703.html  

- After authoritative restore verify that SYSVOL re replicates correctly
- Verify PDC time sync
- verify GPO functionality.
- If you have a CA verify correct operation.

the advice is to authoritative restore the server ( or the two servers) that has all FSMOs as its role.
-Then proceed with the installation and fresh promotion of new DCs.

-Also from the Windows 2012 R2 version, it is possible to clone Domain Controlles with a simple procedure.

Virtualized Domain Controller Cloning Test Guidance for Application Vendors | Microsoft Docs

Virtual Domain Controller Cloning in Windows Server 2012 - Microsoft Tech Community

Step-by-Step Guide to clone a Domain Controller - Technical Blog | REBELADMIN

Obviously by staying 3 weeks behind in addition to changing user passwords, you may also lose the computer\user accounts created during that time.
If you describe us accurately the situation and network topology of the Domain as said by @MicoolPaul  we can make a more detailed plan.
thank you

 

Userlevel 7
Badge +3

Hi all.  There is the possibility that this weekend I might need to restore all of our domain controllers in our domain (hopefully this does not come to pass).  It is not a very active AD domain so hopefully but I might have to go back about 3 weeks to get around a problem.

Can someone point me to documentation as to how to do this from within Veeam with all the DCs down?  This would mean that no dns resolution will be happening (do I have to create a hosts file on the Veeam server to get around this?).

Another hiccup is that for one of the DCs, I notice now that application aware processing has not been turned on so it is going to be restored as just a member server without any special processing for the AD database.

Thanks.

I did this recently mid day in a HUGE environment after someone blew it up. 100’s of VM’s and file servers were not accessible and critical infrastructure was all down.(stressful)

 

  1. YES make hosts files. Have EVERY Veeam server or other server you will need in it, Proxies, storage, Veeam server, SQL server, ESXI host, vCenter, file shares etc. I now have an updated copy in multiple spots if it happens again. Also keep a list of all your Veeam IP’s on hand so if you ever lose DNS.
  2. Consider separating your DC’s and DNS servers. They don’t need to stay together. this makes life much easier going forward.  (Still keep a host file available if you need it)
  3. Call Veeam support for help if you need. 
  1. The easiest way for me would be to power down ALL of the DC’s and restore the master as authoritative. I’d then restore the other/rest and let them sync. Having the roles on on DC going forward lets you know that that is your master and you should always use it for restores. 
  2. There is a procedure for this, Call Veeam and see what they say if your app aware processing doesn’t work.  You will be ok, but i don’t know if the procedure changes. 

 

 

Userlevel 7
Badge +3

Please keep in mind, that restoring a 3 week old backup could result in some problems, like changed passwords (user/computer).

Other than that, it could result in a more serious problem: devices in the network may lose the trust relationship with that domain.

 

Trust relationship failures are generally easy to resolve, but that said, going 3 weeks back, I’m betting, depending on the number of workstations (and servers), there may be a lot of trust relationship failures.  Better be prepped with one of these two commands.  There is also a way to issue this command remotely, but I’d be sure that you know the local admin password on everything first.  Best of luck with you on your Authorative restore.  I don’t think I’ve ever had to do one authoratively….just non-authorative restores.

 

Command Line:

netdom resetpwd /s:Domain-Controller /ud:domain administrator /pd:*

 

PowerShell

Reset-ComputerMachinePassword -Server "EU-S01" -Credential Domain01\ShellAdmin

 

https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.management/reset-computermachinepassword?view=powershell-5.1

https://shellgeek.com/reset-computermachinepassword-in-powershell/

Userlevel 3

Trust relationship failures are generally easy to resolve, but that said, going 3 weeks back, I’m betting, depending on the number of workstations (and servers), there may be a lot of trust relationship failures.

 

In a smaller environment (35 workstations), with not a lot of new activity, what causes a trust relationship to get lost - are they renewed every few days or something?  Or would the only ones lost be those of any new PCs added to the domain?

Userlevel 7
Badge +5

Machine account changes their passwords automatically every 30 days.

https://docs.microsoft.com/en-us/windows/security/threat-protection/security-policy-settings/domain-member-maximum-machine-account-password-age

 

If a computer changed his machine password in the last two weeks, and you restore the AD to a date two weeks ago, this computer must be removed and rejoined to the active directory.

Userlevel 3

hopefully Veeam support is faster than the usual response - but maybe if this is severity 1 I will get someone right away - going to try this on a long weekend starting Friday

-so in short, I power down all 3 DCs, restore the one which I think has the best data, issue a command at the command line or a registry setting to make this the authoratative server, then restore the other 2 servers right?

Scott wrote:

  1. Call Veeeam support for help if you need
  2. The easiest way for me would be to power down ALL of the DC’s and restore the master as authoritative. I’d then restore the other/rest and let them sync.
Userlevel 3

More questions from excellent discussion above:

-netdom resetpwd - so I have a server or user PC that has lost its trust relationship: am I going to be able to log into that server to perform this command?  and if I do, say my server is named DC1 and my domain admin account is SkinnyAdmin, would the command be:

netdom resetpwd /s:DC1 /ud:mydomain.com\SkinnyAdmin /pd:bigFatPassword

-so I assume this from what I have read, since this is a machine password, gets a new generated password from the DC and updates that locally and also in that computer’s object in AD? just checking

-and I assume that this resets the trust relationship at the same time

DSRM password: I have taken over admin of a client’s  network: although I have the administrator passwords, I have not found any DSRM passwords recorded anywhere; I can guess by a list of commonly used passwords but cannot be sure; is that going to prevent a Veeam restore or does it just use the domain admin passwords stored in its credentials setup?  I know someone pasted a link to resetting the DSRM pwd but I would assume that only applies going forward, not going backwards when I do not have access to it.

Userlevel 7
Badge +3

More questions from excellent discussion above:

-netdom resetpwd - so I have a server or user PC that has lost its trust relationship: am I going to be able to log into that server to perform this command?  and if I do, say my server is named DC1 and my domain admin account is SkinnyAdmin, would the command be:

netdom resetpwd /s:DC1 /ud:mydomain.com\SkinnyAdmin /pd:bigFatPassword

-so I assume this from what I have read, since this is a machine password, gets a new generated password from the DC and updates that locally and also in that computer’s object in AD? just checking

-and I assume that this resets the trust relationship at the same time

DSRM password: I have taken over admin of a client’s  network: although I have the administrator passwords, I have not found any DSRM passwords recorded anywhere; I can guess by a list of commonly used passwords but cannot be sure; is that going to prevent a Veeam restore or does it just use the domain admin passwords stored in its credentials setup?  I know someone pasted a link to resetting the DSRM pwd but I would assume that only applies going forward, not going backwards when I do not have access to it.

 

You have to log into the machine with a local password.  You MIGHT be able to log into the machine with a cached domain password if the machine is disconnected from the network.  Then connect to the network so that it can contact the DC.  It then authenticates to the DC with your admin credentials and creates and syncs a new machine password with AD.  The Netdom and Reset-ComputerMachinePassword PowerShell cmdlet performs the same basic functions here.  Once both the workstation and AD are in sync with the passwork, the trust relationship is validated.  I’m not sure where the password is generated...I’ve always assumed that the workstation generates the password and tell’s AD what it is using your AD creds. 

Note that this is going to be for member servers and workstations.  DC’s don’t have a local admin password and should always be able to communicate with the copy of AD it’s hosting on itself.  If a DC becomes disconnected from the rest of AD due to sync issues (firewall, VPN, etc), and it’s been too long, it can become tombstoned and can no longer safely synch with AD. 

As for your command syntax, you can type in the /pd:bigFatPassword if you need to script things.  That said, to type it realtime (prompted) remove the bigFatPassword and replace it with *.  Using /pd:* will prompt you for the password so that you don’t have a domain admin password sitting on a script or note or whatever somewhere.  I don’t believe you have to use the /s:DC1 to specify a server name…..omitting it should cause it to connect to any DC it can talk to.

 

I’ve never reset a DSRM password….I imagine it can be done but I’ve never checked.  Of course, if that was the case, I assume you’d need to have valid domain access to the DC to do so.

Userlevel 7
Badge +6

I’ve never reset a DSRM password….I imagine it can be done but I’ve never checked. 

https://www.dell.com/support/kbdoc/it-it/000136611/resetting-the-directory-services-restore-mode-administrator-password

Userlevel 7
Badge +1

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Userlevel 7
Badge +3

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Yeah, I shudder at the thought of restoring all DC’s.  I’d honestly consider shutting down all of the DC’s, restoring one authoratively, and then cleaning up AD and building new DC’s for any remote systems.  Building a DC is practically throw-away anymore, especially since you don’t have to do metadata cleanups anymore.

Userlevel 3

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Yeah, I shudder at the thought of restoring all DC’s.  I’d honestly consider shutting down all of the DC’s, restoring one authoratively, and then cleaning up AD and building new DC’s for any remote systems.  Building a DC is practically throw-away anymore, especially since you don’t have to do metadata cleanups anymore.

I have thought of that too - just restoring one DC, cleaning it up and then building others from that.  The DC with most likely the latest changes on it (password changes, trust relationships) though does not hold the FSMO roles.  So if I started with this, I would have to seize all the FSMO roles.

If I start with the server with the FSMO roles, it is at their head office but because most of their users log into VDI desktops at their data center, most likely there will be more broken trust relationships.

Anyhow, hopefully it does not come to this.  Here is my plan (for what it’s worth); I started this thread to get an understanding about recovering via Veeam backups but I am actually just going to try to get AD working again across sites.  As a reminder, the problem is there was a new DC being built that has become corrupted (not accessible) and the computer object in AD looks “messed up”.  No way to get into the DC to demote it and clean up.  But it is running and if I down it, one of our servers stops working.  I won’t go into more details but suffice to say, I need to get that DC out of the domain, clean up AD and then promote a new one.  So here is my plan that if all goes well, there will be no restore from backup:

  • down the problem DC
  • delete all the references to it in AD manually
  • AD synching seems to be somewhat stopped - repadmin says the DCs can connect but replication stopped due to errors
  • issue commands to “force” replication between the good DCs
  • hopefully all is well

Question: having said the above, if I just do the above, will the bad DC object just come back from one of the other DCs?  Do I need to delete it on all DCs manually before forcing the replication?  or should I maybe down all DCs except one, make the deletion, set that DC as authoratative (via the registry setting I have read about), reboot that DC (I assume I need a reboot for the setting to take affect) and then bring up the other 2 DCs and force a replicate?  comments?

Userlevel 7
Badge +3

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Yeah, I shudder at the thought of restoring all DC’s.  I’d honestly consider shutting down all of the DC’s, restoring one authoratively, and then cleaning up AD and building new DC’s for any remote systems.  Building a DC is practically throw-away anymore, especially since you don’t have to do metadata cleanups anymore.

I have thought of that too - just restoring one DC, cleaning it up and then building others from that.  The DC with most likely the latest changes on it (password changes, trust relationships) though does not hold the FSMO roles.  So if I started with this, I would have to seize all the FSMO roles.

If I start with the server with the FSMO roles, it is at their head office but because most of their users log into VDI desktops at their data center, most likely there will be more broken trust relationships.

 

 

AD should keep a copy of all across all servers.  If there are broken trusts, that should be replicated throughout to all DC’s in AD, so it should not matter which DC is brought online unless we’re talking about changes being made in a narrow window before they are replicated to the other DC’s.  The aside would be if replication had already failed and different DC’s had different copies of AD that were not being replicated amongst themselves.  As previously noted, if this has happened for too long, then the DC’s will have reached their tombstone lifetime and really are of little to no use anyway.  Seizing FSMO roles is trivial, as is cleaning up AD from other DC’s that are no longer available.  Taking down all of the DC’s no longer desired and restoring the most desirable DC authoratively would likely be the way to go.  Seize FSMO, cleanup AD from the unwanted DC’s to prevent any further chances of replication, and then proceed with building new DC’s.

 

Anyhow, hopefully it does not come to this.  Here is my plan (for what it’s worth); I started this thread to get an understanding about recovering via Veeam backups but I am actually just going to try to get AD working again across sites.  As a reminder, the problem is there was a new DC being built that has become corrupted (not accessible) and the computer object in AD looks “messed up”.  No way to get into the DC to demote it and clean up.

 

 

This seems unusual for sure.  But that said, if you have a failed/corrupted DC, power it off and blow it away, clean that DC out of AD and build a new one.  I’d typically suggest using a new name to prevent an possible rogue data from infiltrating back into things….probably not likely, but you’re already in a precarious position, best to not add any possible complications.

 

  But it is running and if I down it, one of our servers stops working.  I won’t go into more details but suffice to say, I need to get that DC out of the domain, clean up AD and then promote a new one.  So here is my plan that if all goes well, there will be no restore from backup:

  • down the problem DC
  • delete all the references to it in AD manually
  • AD synching seems to be somewhat stopped - repadmin says the DCs can connect but replication stopped due to errors
  • issue commands to “force” replication between the good DCs
  • hopefully all is well

Question: having said the above, if I just do the above, will the bad DC object just come back from one of the other DCs?  Do I need to delete it on all DCs manually before forcing the replication?  or should I maybe down all DCs except one, make the deletion, set that DC as authoratative (via the registry setting I have read about), reboot that DC (I assume I need a reboot for the setting to take affect) and then bring up the other 2 DCs and force a replicate?  comments?

 

You plan looks very similar to what I would do.  But I don’t think I’d try and delete AD data from existing DC’s and reuse them.  I’d just build new.  I suppose you could forcibly remove AD from a DC and reuse it, by why take the risk?  As previously noted, they’re practically throwaway anyway.

Obviously, I can’t make a good call without knowing details on the number of DC’s you have, how many are local vs remote sites, how many remote sites, the root cause of the problem, etc.  And clearly we’ve gone well beyond the scope of restoring DC’s in Veeam.  If you know the root cause of the problem and have resolved it, and you know that AD data is valid as of a certain date, and that you know that the remote site DC’s can either be restored non-authoratively and sync with the DC that is authoratively restored, then I suppose you could resuse them.  But I’d be hesitant to do so in a more complex environment where changes of a rogue DC could cause wrinkles and you have to start the process over again.  I don’t know enough of your situation to make a call, but it sounds more complex than I’d want to take the risk for, and would rather take a safe route and restore the most likely good DC, validate it (offline if I must), and then blow away the others and join them to this one.

Userlevel 7
Badge +3

The restore of all 4 of our DC’s  by useing one as authoritative, then restoring the other 3 went super smooth when we used  Veeam support to assist us.

 

The things that took the longest were about 2 hours fixing DNS problems we had. Granted our DC’s had a crazy issue and all crashed. Now DNS is separate, but those host files would have been nice BEFORE hand.  Get your DNS working before and you are already ahead of the game for DR planning.

 

The restore took longer than expected for us as well. I can’t remember if it was something with networking, storage, or VMware but it took about an hour to restore the DC and I was expecting minutes. 

 

All in all it went great.  You also have the benefit of powering them all down, doing a snapshot or backup BEFORE you start while everything is in a consistent state so it’s pretty stress free.

 

 

Userlevel 3

I am writing up a “playbook” now with a couple different scenarios so forgive me for random questions that may or may not be the “best plan” i.e. I just want to know more in case I have to go down different routes:

-if I keep one DC (the one with most of the newest changes), and just down and delete the other 2 DCs, I am going to have DNS entries for the other two DCs still in DNS (I assume that because the other 2 DCs are at a different site and even if I demote them and remove them from the domain, it is possible that their records will persist on the good DC

-question from that: other than the A record in the root of the zone, do I have to find and remove every reference for each of the downed DCs (service records etc.) or is there one spot to delete it such that it cascade deletes to all service records?

Userlevel 7
Badge +3

Deleting the DC’s from AD will not remove their DNS entries as I recall.  You’ll likely need to manually delete those entries from DNS.  You should be able to do this from any DC that is replicating properly.  Once those changes have replicated, you shouldn’t see any more entries for those old DC’s.  That said, you’ll need to remove them from Active Directory Sites and Services to actually remove the DC’s from AD and again, those changes would be replicated to other valid DC’s in the domain.  A graceful demotion is always better than forcibly removing DC’s from a domain, but over the course of this conversation, it sounds as if that’s not an option.

I believe if you forcibly remove a DC from a domain, there will be several DNS records remaining, such as the parent record, A record, and if there are any other records that reference the removed DC, those as well.

Comment