Solved

Restoring all domain controllers in a Windows domain (on vmware)


Userlevel 3

Hi all.  There is the possibility that this weekend I might need to restore all of our domain controllers in our domain (hopefully this does not come to pass).  It is not a very active AD domain so hopefully but I might have to go back about 3 weeks to get around a problem.

Can someone point me to documentation as to how to do this from within Veeam with all the DCs down?  This would mean that no dns resolution will be happening (do I have to create a hosts file on the Veeam server to get around this?).

Another hiccup is that for one of the DCs, I notice now that application aware processing has not been turned on so it is going to be restored as just a member server without any special processing for the AD database.

Thanks.

icon

Best answer by regnor 31 August 2022, 07:53

View original

39 comments

Userlevel 3

Sorry for the super-long delay; after successfully doing the work on the server, I left a few days later for two weeks holidays (not that I like to do that - I would rather be around for a bit but they were booked and everything seemed okay!).  In short, was able to clean up AD for the “bad DC” instead of having to restore from Veeam.  Even though the below is not a “veaam” solution, thought a post-op summary would be informative to anyone reading this thread - and I still had a few questions re what happened - more for my info than anything.  I also still have some specific questions re being better prepared in case next time I have to restore from Veeam.  But for now, here is a log of what I did to do the cleanup.

>>>>>>>>>>>>>>>>>>>>>>>>>»


Reminder: we had one bad DC in our environment that we could not log into in any way so we could not demote it gracefully.  This happened about 3 weeks ago and we were starting to experience replication problems (changes not being replicated between DCs) so we really needed to get it fixed. 

Steps taken:

-shut down the bad DC; before this fix, doing so would cause a significant problem on one of our servers that was using LDAP to this server for authentication (could not get around this) even though this server was *hardcoded* to use LDAP on another DC (not the “bad” DC) (could not even get support for this product to figure out why LDAP not pointing to the correct server).

-go into ADUC (on the other DC at the same site) and delete the bad DC from the Domain Controllers OU

-go into DNS and see if bad DC was removed from DNS; result: only one instance was removed correctly (at the root) but all lower instances of it in all kinds of nodes were not removed; manually go through the forward and reverse trees and manually deleted all references to the bad DC; note that some might think this is extreme but we were going to permanently get rid of this DC anyhow (when we build back another DC, it will have a different name)

-go into Sites and Services (SaS) and remove the bad DC from the pertinent site; remove any connection objects between the bad DC and other DCs

-at this point I stopped and fired up the server that would always “choke” because it was looking for LDAP services from the bad DC; this time though, it worked fine without it - it reverted back to getting LDAP from the other DC at that site

-went into SaS again and fired the replication connections between the remaining DCs; at first I got an error message (sorry, did not record) but that eventually went away - not sure if somehow replication of the connection object finally happened enough to allow full replication to happen and I just needed to wait, or did all my poking about finally get it going.  I really did not know if I had to somehow fix things up for the sites or not - but finally it worked (if anyone has any insight into that, I would not mind knowing).

-at this point, replication seemed to be working correctly - tested by issuing repadmin /replsummary and dcdiag etc.

-all servers stopped having errors and nslookups and all tests seemed okay; newly created objects on the one DC finally showed up on the other DCs (and obviously changes as well);

-one interesting item was that just before the fix took place last week, a few users including myself, had our mapped network drives disappear to one of our servers and we could not re-create them at all; so somehow some trust relationship or something was starting to break down such that our mapped drives stopped working (if anyone has insight on this as well, would like to hear)

I still have a few dcdiag warnings to look at this morning - but when reading a bit on each, none seemed serious (more warnings for best practices etc.); I might post those as another post if they are related to this situation.

>>>>>>>>>>>>>>>>>>>>>

Thanks a lot everyone for the help.  I learned a lot more about what I might have needed to do for a Veeam restore of all DCs - glad I did not have to do it.  I also feel like I need to summarize some of the thread items here to have handy in case we need it.  Hopefully I can get to that before this gets too stale in my mind.

Albert (for McKITGuys)

Userlevel 7
Badge +8

I feel like there needs to be a MUCH more clear way of setting this up and preforming it. From Veeam and Microsoft both.   I’m talking fail safe step by step hold my hand instructions.

 

I bet a lot of people are not set up for successes in the event they need to restore their DC’s. 

Userlevel 7
Badge +7

Interesting thread, We have the issue that after starting our DR site (which is built using a Veeam replication job from backup copies), the sysvol share is not replicating anymore (we have 3 DC’s). I can only get it to replicate again by adding the following regkeys to the DC which holds the FSMO roles:

 

reg add "HKLM\System\CurrentControlSet\Services\DFSR\Restore" /v SYSVOL /t REG_SZ /d "authoritative"
reg add "HKLM\SYSTEM\CurrentControlSet\Control\BackupRestore\SystemStateRestore" /v LastRestoreId /t REG_SZ /d "10000000-0000-0000-0000-000000000000" 

After a reboot of the DC I added these keys to sysvol is replicating again.

Is this also a known issue?

yes, you performed an Authoritative SYSVOL restore.

Userlevel 5
Badge

Interesting thread, We have the issue that after starting our DR site (which is built using a Veeam replication job from backup copies), the sysvol share is not replicating anymore (we have 3 DC’s). I can only get it to replicate again by adding the following regkeys to the DC which holds the FSMO roles:

 

reg add "HKLM\System\CurrentControlSet\Services\DFSR\Restore" /v SYSVOL /t REG_SZ /d "authoritative"
reg add "HKLM\SYSTEM\CurrentControlSet\Control\BackupRestore\SystemStateRestore" /v LastRestoreId /t REG_SZ /d "10000000-0000-0000-0000-000000000000" 

After a reboot of the DC I added these keys to sysvol is replicating again.

Is this also a known issue?

Userlevel 5
Badge

hi @JailBreak 

MS best practices advise against restoring a Domain Controller via a hot snpashot.

I only utlized the cold vmware snapshot before performing an FFL\DFL upgrade. 

I strongly advise against restoring a DC from snapshot.

Snapshot for Win2k12 domain controllers (microsoft.com)

Performing Domain Controller rollback via VM snapshot or Image backup? (microsoft.com)

😁

 

Using a backup tool for DC VMs, there is no other way.  And honestly I don't know anyone that will power off any DC to backup, particularly when using Application Aware.

And has Veeam says:
“ Veeam supports Application Aware backup of Active Directory for Virtual Machine and Physical Servers. “ and ”... When possible, it’s recommended to backup the Domain Controller with most FSMO”

Like I said, I have done restores from 1 or 2 to all DCs in the environment, did not find any issue with that.

And with old Win2K12 we could have some issues or more difficult to restore, but with new 2016/2019 and the latest one, those old rules don't apply anymore for Virtual Domain Controllers. Again, as long you follow the Best Practices to perform those backups.

Hi @JailBreak 

Sorry I was too general, precise, don't restore a DC directly from a Vmware snapshot.

It is obvious that a backup of AD with Application Aware is consistent.

To be precise, I turned off the two DCs with FSMO roles, before the FFL\DFL upgrade. At that time there was no rollback after FFL\DFL upgrade from 2003 to 2008.

I enclose my topic post that you may not have read: :)

As mentioned above, there are some verifications before performing the restore:

https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/ad-forest-recovery-determine-how-to-recover 

- Verify that you have all DSRM passwords for each Domain Controller to be restored.

- I attach procedure for resetting DSRM passwords. https://www.dell.com/support/kbdoc/it-it/000136611/resetting-the-directory-services-restore-mode-administrator-password 

- All domain controllers with FSMO roles must be under backup.

- Perform a Virtual lab restore with clients in domain to verify that the Forest\Domain is consistent after the reset.

- Verify which sync tecnology of SYSVOL FRS or DFSR.

Backing Up and Restoring an FRS-Replicated SYSVOL Folder - Win32 apps | Microsoft Docs

- the restore cases can be two:
  - Non-authoritative restore allows you to restore individual domain objects, this Veeam does easily; you also have the ability to compare the object in production with the object to be restored. in case of changes to the object.

- What is the Forest\Domain Functional level?
- Since Windows 2012 R2, it is possible to enable the "recycle bin" for fast restoring of objects without usingthird parties party software.

Authoritative Restore to completely restore the Forest\Domain

https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/ad-forest-recovery-perform-initial-recovery 

I attach info recycle bin + Veeam AD Aware.
https://forums.veeam.com/veeam-backup-replication-f2/veeam-explorer-for-ad-and-ad-recycle-bin-enable-t29703.html  

- After authoritative restore verify that SYSVOL re replicates correctly
- Verify PDC time sync
- verify GPO functionality.
- If you have a CA verify correct operation.

the advice is to authoritative restore the server ( or the two servers) that has all FSMOs as its role.
-Then proceed with the installation and fresh promotion of new DCs.

-Also from the Windows 2012 R2 version, it is possible to clone Domain Controlles with a simple procedure.

Virtualized Domain Controller Cloning Test Guidance for Application Vendors | Microsoft Docs

Virtual Domain Controller Cloning in Windows Server 2012 - Microsoft Tech Community

Step-by-Step Guide to clone a Domain Controller - Technical Blog | REBELADMIN

Obviously by staying 3 weeks behind in addition to changing user passwords, you may also lose the computer\user accounts created during that time.
If you describe us accurately the situation and network topology of the Domain as said by @MicoolPaul  we can make a more detailed plan.
thank you

 

In the list that I share in my first comment, is about VM-Generation – ID Change, not backups. It where the ID is change every time there is a change on the VM and with this DC knows there was a change on the system.

So every time there is one of those actions, VM-Generation – ID Change changes.

Honestly there is on my list to write an update blog post about Backup and Restore DCs using the latest Windows versions. But as many others that are on my list, still is in the wishing list :)

Userlevel 7
Badge +7

@MckITGuys thanks for the feedback.
Your case is a classic one for an AD architecture.Usually when a single Domain Controller accuses a problem and you cannot restore all functionality properly, it is easier to perform a "force removal" of the corrupted DC.

Force removal guide:

- Force removal Computer Object DC from ADUC
-DELETE Computer Object

https://techcommunity.microsoft.com/t5/itops-talk-blog/step-by-step-manually-removing-a-domain-controller-server/ba-p/280564 
Perform metadata cleanup

Proceed with the following checks:

- From the DNS console check & delete all entries pertaining to the removed Domain Controller.
- Edit ADSI Edit: Configuration Partition and verify that the DC object has been properly removed
Delete entry from Name Server tab
Delete the SRV records :
DNS: _msdcs.domainname.com -> CNAME (GUID - fqdn) of the DC
DNS: _msdcs.domainname.com -> dc -> _sites -> sitename -> _tcp -> SRV Records for Kerberos and LDAP
DNS: _msdcs.domainname.com -> domains -> domain guid -> _tcp -> LDAP SRV Records
DNS: _msdcs.domainname.com -> gc -> _sites and _tcp (if the DC was also GC)
DNS: _msdcs.domainname.com -> pdc -> _tcp -> LDAP SRV (if the DC was also PDC)
DNS: domainname.com -> A Record for IPv4 and IPv6

- Open Active Directory Site & Service delete in the afferent site the DC server object from the replication partners (this object is not automatically deleted.

- Verify that all AD post force removal features work correctly from replicas to FSMO role and AD time sync.

I attach a few commands for verification:

netdom query fsmo
nltest /dclist:yourdomain.local
nltest /dsgetdc:mydomain.local /force /gc

repadmin /showrepl
repadmin /replsummary
repadmin /replsummary
repadmin /Queue

How to Force Active Directory Replication.

This will do a pull replication, which means it will pull updates from DC2 to DC1.
REPADMIN
REPADMIN
If you want to push replication you will use the /P switch. For example if you make changes on DC1 and want to replicate those to other DCs use this command.

repadmin /syncall dc1 /APeD

repadmin /replsummary > c:\it\replsummary.txt
repadmin /istg * /verbose
DCDIAG

dcdiag /test:dns                    
dcdiag /test:netlogons
dcdiag /test:advertising /v
dfsrdiag ReplicationState /member:yourDCname 

dfsrdiag ReplicationState /member:yourDCname 
dcdiag /test:topology
dcdiag /test:replications
dcdiag /c /e /v >> c:\ADHealth.txt

DNSLint is a Microsoft Windows utility that helps you to diagnose common DNS name resolution issues.
    dnslint /ad /s #IPAddressOfServer#

Proceed to run a new DC promo to replace the DC decom
Check post DCPRMO

repadmin /showrepl /repsto

Get-WinEvent -LogName "Directory Service" | ?{$_.Id -eq 1119} | FL

Get-WinEvent -LogName "DFS Replication" | ?{$_.Id -eq 4604} | FL

dcdiag /c /d /v

DCDiag (part of WS03 SP1 Support tools) displays all information about Domain Controller information.
dcdiag.exe /V /C /D /E /s:#DomainControllerName# > c:\dcdiag.log

NetDiag provides information about specific network configuration for the local machine.
netdiag.exe /v > c:\netnetdiag.log

RepAdmin helps diagnise AD replication issues with WS03 and WS08 DC's.
repadmin.exe /showrepl dc* /verbose /all /intersite > c:\repl.txt
repadmin /syncall -APed.
 
Regards

Userlevel 7
Badge +7

hi @JailBreak 

MS best practices advise against restoring a Domain Controller via a hot snpashot.

I only utlized the cold vmware snapshot before performing an FFL\DFL upgrade. 

I strongly advise against restoring a DC from snapshot.

Snapshot for Win2k12 domain controllers (microsoft.com)

Performing Domain Controller rollback via VM snapshot or Image backup? (microsoft.com)

😁

 

Using a backup tool for DC VMs, there is no other way.  And honestly I don't know anyone that will power off any DC to backup, particularly when using Application Aware.

And has Veeam says:
“ Veeam supports Application Aware backup of Active Directory for Virtual Machine and Physical Servers. “ and ”... When possible, it’s recommended to backup the Domain Controller with most FSMO”

Like I said, I have done restores from 1 or 2 to all DCs in the environment, did not find any issue with that.

And with old Win2K12 we could have some issues or more difficult to restore, but with new 2016/2019 and the latest one, those old rules don't apply anymore for Virtual Domain Controllers. Again, as long you follow the Best Practices to perform those backups.

Hi @JailBreak 

Sorry I was too general, precise, don't restore a DC directly from a Vmware snapshot.

It is obvious that a backup of AD with Application Aware is consistent.

To be precise, I turned off the two DCs with FSMO roles, before the FFL\DFL upgrade. At that time there was no rollback after FFL\DFL upgrade from 2003 to 2008.

I enclose my topic post that you may not have read: :)

As mentioned above, there are some verifications before performing the restore:

https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/ad-forest-recovery-determine-how-to-recover 

- Verify that you have all DSRM passwords for each Domain Controller to be restored.

- I attach procedure for resetting DSRM passwords. https://www.dell.com/support/kbdoc/it-it/000136611/resetting-the-directory-services-restore-mode-administrator-password 

- All domain controllers with FSMO roles must be under backup.

- Perform a Virtual lab restore with clients in domain to verify that the Forest\Domain is consistent after the reset.

- Verify which sync tecnology of SYSVOL FRS or DFSR.

Backing Up and Restoring an FRS-Replicated SYSVOL Folder - Win32 apps | Microsoft Docs

- the restore cases can be two:
  - Non-authoritative restore allows you to restore individual domain objects, this Veeam does easily; you also have the ability to compare the object in production with the object to be restored. in case of changes to the object.

- What is the Forest\Domain Functional level?
- Since Windows 2012 R2, it is possible to enable the "recycle bin" for fast restoring of objects without usingthird parties party software.

Authoritative Restore to completely restore the Forest\Domain

https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/ad-forest-recovery-perform-initial-recovery 

I attach info recycle bin + Veeam AD Aware.
https://forums.veeam.com/veeam-backup-replication-f2/veeam-explorer-for-ad-and-ad-recycle-bin-enable-t29703.html  

- After authoritative restore verify that SYSVOL re replicates correctly
- Verify PDC time sync
- verify GPO functionality.
- If you have a CA verify correct operation.

the advice is to authoritative restore the server ( or the two servers) that has all FSMOs as its role.
-Then proceed with the installation and fresh promotion of new DCs.

-Also from the Windows 2012 R2 version, it is possible to clone Domain Controlles with a simple procedure.

Virtualized Domain Controller Cloning Test Guidance for Application Vendors | Microsoft Docs

Virtual Domain Controller Cloning in Windows Server 2012 - Microsoft Tech Community

Step-by-Step Guide to clone a Domain Controller - Technical Blog | REBELADMIN

Obviously by staying 3 weeks behind in addition to changing user passwords, you may also lose the computer\user accounts created during that time.
If you describe us accurately the situation and network topology of the Domain as said by @MicoolPaul  we can make a more detailed plan.
thank you

 

Userlevel 3

How did you get on @MckITGuys? If there was any particular advice that you found the most helpful, please don’t forget to mark it as the answer to help others in your position in the future 🙂

Hi all,

First off, big thanks to everyone who jumped in with advice.  Learned a lot through this exercise (not exactly my kind of exercise though :-)  I need to read back through the thread and mark a few more replies but since am short on time this morning, will just give a summary of what happened.  Note that the conclusion did NOT need to use Veeam for restores but in case someone hits this thread and wants to know the outcome, it might help them.  As well a caveat - this might not have been the “best” way to do this (i.e. manually pruning AD) but it was the first step we opted to take instead of getting into restoring DCs.

Reminder: we had one bad DC in our environment that we could not log into no-how so we could not demote it gracefully.  This happened about 3 weeks ago and we were starting to experience replication problems (changes not being replicated) so we really needed to get it fixed.  Steps taken:

-shut down the bad DC; before this fix, doing so would cause a significant problem on one of our servers that was using LDAP to this server for authentication (could not get around this) even though this server was *hardcoded* for it’s LDAP settings (could not even get support for this product to figure out why).

-go into ADUC and delete the bad DC from the Domain Controllers OU

-go into DNS and see if this DC was removed from DNS: only one instance was removed correctly (at the root) but all lower instances of it in all kinds of nodes were not removed; manually go through the forward and reverse trees and manually delete all references to the old DC; note that some might think this is extreme but we were going to permanently get rid of this DC anyhow (when we build back another DC, it will have a different name)

-go into Sites and Services (SaS) and remove the bad DC from the pertinent site; remove any connection objects between the bad DC and other DCs

-at this point I stopped and fired up the server that would always “choke” because it was looking for LDAP services from the bad server; this time, it worked fine without it - it reverted back to the other DC at that site

-went into SaS again and fired the replication connections between the remaining DCs; at first I got an error message but that eventually went away - not sure if somehow replication of the context object finally happened enough to allow full replication to happen and I just needed to wait, or did all my poking about finally get it going.  I really did not know if I had to somehow fix things up for the sites or not - but finally it worked (if anyone has any insight into that, I would not mind knowing).

-at this point, it looked as though replication was working correctly via issuing repadmin /replsummary and dcdiag etc.

-all servers stopped having errors and nslookups and all tests seemed okay; newly created objects on the one DC finally showed up on the other DCs (and obviously changes as well);

-one interesting item was that just before the fix took place last week, a few users including myself, had our mapped network drives disappear to one of our servers and we could not re-create them at all; so somehow some trust relationship or something was starting to break down such that our mapped drives stopped working (if anyone has insight on this as well, would like to hear)

I still have a few dcdiag warnings to look at this morning - but when reading a bit on each, none seemed serious (more warnings for best practices etc.); I might post those as another post if they are related to this situation.

Thanks a lot everyone for the help.  I learned a lot more about what I might have needed to do for a Veeam restore of all DCs - glad I did not have to do it.  I also feel like I need to summarize some of the thread items here to have handy in case we need it.  Hopefully I can get to that before this gets too stale in my mind.

Albert (for McKITGuys)

Userlevel 5
Badge

hi @JailBreak 

MS best practices advise against restoring a Domain Controller via a hot snpashot.

I only utlized the cold vmware snapshot before performing an FFL\DFL upgrade. 

I strongly advise against restoring a DC from snapshot.

Snapshot for Win2k12 domain controllers (microsoft.com)

Performing Domain Controller rollback via VM snapshot or Image backup? (microsoft.com)

😁

 

Using a backup tool for DC VMs, there is no other way.  And honestly I don't know anyone that will power off any DC to backup, particularly when using Application Aware.

And has Veeam says:
“ Veeam supports Application Aware backup of Active Directory for Virtual Machine and Physical Servers. “ and ”... When possible, it’s recommended to backup the Domain Controller with most FSMO”

Like I said, I have done restores from 1 or 2 to all DCs in the environment, did not find any issue with that.

And with old Win2K12 we could have some issues or more difficult to restore, but with new 2016/2019 and the latest one, those old rules don't apply anymore for Virtual Domain Controllers. Again, as long you follow the Best Practices to perform those backups.

Userlevel 7
Badge +8

Wondering how it worked out for you. 

 

Don’t restore from a snapshot? So don’t use Veaam at all essentially is what you are saying?   I won’t be powering down my DC’s to backup daily jobs. 

 

Yes, I realize powering all of them down and doing a snapshot is ideal, but not realistic in most 24/7 environments. The same reason we have multiple DC’s.

 

I found the ideal solution is to give one DC the FSMO roles (unless your environment is HUGE and you want to split them up. but under a few thousand users this is probably fine.)  Power off ALL of the DC’s, and restore that specific VM as authoritative.  The Veaam docs \support worked 100% for us in a crit sit with everyone down. 

Userlevel 7
Badge +7

hi @JailBreak 

MS best practices advise against restoring a Domain Controller via a hot snpashot.

I only utlized the cold vmware snapshot before performing an FFL\DFL upgrade. 

I strongly advise against restoring a DC from snapshot.

Snapshot for Win2k12 domain controllers (microsoft.com)

Performing Domain Controller rollback via VM snapshot or Image backup? (microsoft.com)

😁

 

Userlevel 5
Badge

Hi

VMs, I have restore full AD a couple of times. No issues, particularly if you are using Windows 2016 and above.

The “new” features VM-Generation will control the changes in the AD when is to restore.

When this ID change?

Scenario VM-Generation – ID Change

  • VMware vMotion®/VMware vSphere Storage vMotion / Hyper-V Live Migration – No
  • Virtual machine pause/resume – No
  • Virtual machine reboot – No
  • vSphere host reboot – No
  • Delete VM Snapshot – No
  • Import virtual machine – Yes
  • Cold clone – Yes
  • Hot clone – Yes
    Note: Either Microsoft or VMware do not support virtual domain controllers hot cloning. Do not attempt hot cloning under any circumstances.
  • New virtual machine from VMware Virtual Disk Development Kit (VMDK) copy – Yes
  • Cold snapshot revert (while powered off or while running and not taking a memory snapshot) – Yes
  • Hot snapshot revert (while powered on with a memory snapshot) – Yes
  • Restore from virtual machine level backup – Yes
  • Virtual machine replication (using both host-based and array-level replication) – Yes

     

Before we restore the broken Domain Controller, some Domain Controllers restore Best Practices.

  • If you are restoring a broken DC in a multi DCs environment, then you can do a normal restore and follow then next restore process.
  • If you are restoring all DCs in your Domain, then you should first restore the one with all RMSO roles (this information you should have before you lost your DCs).
  • If you are restoring the DC that had the FMSO roles and you want to keep that way(meaning all the existing DCs will sync from your restored DC), then you need to do follow Microsoft How to recover authoritative restore .

So when you need to restore domains controllers, always try to restore the last one that was backup.

I have written a couple of articles regarding this procedure.

Userlevel 7
Badge +20

How did you get on @MckITGuys? If there was any particular advice that you found the most helpful, please don’t forget to mark it as the answer to help others in your position in the future 🙂

Userlevel 7
Badge +8

Wow. Quite a comprehensive thread this has gotten. Very interesting topic.

To me the most important part of the question to me in the first place was:

All (!) DCs have to be restored at once.

If you try that with VBR without further measures it will fail.

Not a single one of the DCs will bring up the domain. This is because VBR - even since V2 - is smart enough to always do a non-authoritative-restore. This will bring up the DC in a state which first wants to copy the full domain from another DC still alive. You can take note of the process by the additional reboot a DC always does after being recovered by VBR.

Without this full sync of the domain DB your recovered DC will be a useless brick.

Problem is: If you recovered all of them, they will all look for a surviving DC - but there is none.

You then have to put one of them into an authoritative restore manually afterwards by e.g. setting the “Burflags”. Have that process in your emergency guideline just in case...

Veeam btw CAN do an authoritative restore. You can opt to do that in a SureBackup application group. Otherwise every SureBackup test would also come up with useless DCs. Keep in mind that if you have more than one DC in a SureBackup application group, only one should be brought up authoritative while the others must be set to non-authoritative. Then those will copy the domain from the authoritative one.

SureBackup can even be used to automate the authoritative restore of a DC. You might just move the DC from the lab into the “real” world AKA network. Do it carefully. No other DC must be alive. Otherwise your domain will break be getting inconsistent. 

Have a secondary DNS without AD integration for your Veeam components for that. Otherwise without DNS, VBR and SureBackup will not function as expected… We always put a secondary DNS on top of the VBR server to be used only as a backup for the VI and Veeam. It’s just a service...

Userlevel 7
Badge +6

Deleting the DC’s from AD will not remove their DNS entries as I recall.  You’ll likely need to manually delete those entries from DNS.  You should be able to do this from any DC that is replicating properly.  Once those changes have replicated, you shouldn’t see any more entries for those old DC’s.  That said, you’ll need to remove them from Active Directory Sites and Services to actually remove the DC’s from AD and again, those changes would be replicated to other valid DC’s in the domain.  A graceful demotion is always better than forcibly removing DC’s from a domain, but over the course of this conversation, it sounds as if that’s not an option.

I believe if you forcibly remove a DC from a domain, there will be several DNS records remaining, such as the parent record, A record, and if there are any other records that reference the removed DC, those as well.

Userlevel 3

I am writing up a “playbook” now with a couple different scenarios so forgive me for random questions that may or may not be the “best plan” i.e. I just want to know more in case I have to go down different routes:

-if I keep one DC (the one with most of the newest changes), and just down and delete the other 2 DCs, I am going to have DNS entries for the other two DCs still in DNS (I assume that because the other 2 DCs are at a different site and even if I demote them and remove them from the domain, it is possible that their records will persist on the good DC

-question from that: other than the A record in the root of the zone, do I have to find and remove every reference for each of the downed DCs (service records etc.) or is there one spot to delete it such that it cascade deletes to all service records?

Userlevel 7
Badge +8

The restore of all 4 of our DC’s  by useing one as authoritative, then restoring the other 3 went super smooth when we used  Veeam support to assist us.

 

The things that took the longest were about 2 hours fixing DNS problems we had. Granted our DC’s had a crazy issue and all crashed. Now DNS is separate, but those host files would have been nice BEFORE hand.  Get your DNS working before and you are already ahead of the game for DR planning.

 

The restore took longer than expected for us as well. I can’t remember if it was something with networking, storage, or VMware but it took about an hour to restore the DC and I was expecting minutes. 

 

All in all it went great.  You also have the benefit of powering them all down, doing a snapshot or backup BEFORE you start while everything is in a consistent state so it’s pretty stress free.

 

 

Userlevel 7
Badge +6

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Yeah, I shudder at the thought of restoring all DC’s.  I’d honestly consider shutting down all of the DC’s, restoring one authoratively, and then cleaning up AD and building new DC’s for any remote systems.  Building a DC is practically throw-away anymore, especially since you don’t have to do metadata cleanups anymore.

I have thought of that too - just restoring one DC, cleaning it up and then building others from that.  The DC with most likely the latest changes on it (password changes, trust relationships) though does not hold the FSMO roles.  So if I started with this, I would have to seize all the FSMO roles.

If I start with the server with the FSMO roles, it is at their head office but because most of their users log into VDI desktops at their data center, most likely there will be more broken trust relationships.

 

 

AD should keep a copy of all across all servers.  If there are broken trusts, that should be replicated throughout to all DC’s in AD, so it should not matter which DC is brought online unless we’re talking about changes being made in a narrow window before they are replicated to the other DC’s.  The aside would be if replication had already failed and different DC’s had different copies of AD that were not being replicated amongst themselves.  As previously noted, if this has happened for too long, then the DC’s will have reached their tombstone lifetime and really are of little to no use anyway.  Seizing FSMO roles is trivial, as is cleaning up AD from other DC’s that are no longer available.  Taking down all of the DC’s no longer desired and restoring the most desirable DC authoratively would likely be the way to go.  Seize FSMO, cleanup AD from the unwanted DC’s to prevent any further chances of replication, and then proceed with building new DC’s.

 

Anyhow, hopefully it does not come to this.  Here is my plan (for what it’s worth); I started this thread to get an understanding about recovering via Veeam backups but I am actually just going to try to get AD working again across sites.  As a reminder, the problem is there was a new DC being built that has become corrupted (not accessible) and the computer object in AD looks “messed up”.  No way to get into the DC to demote it and clean up.

 

 

This seems unusual for sure.  But that said, if you have a failed/corrupted DC, power it off and blow it away, clean that DC out of AD and build a new one.  I’d typically suggest using a new name to prevent an possible rogue data from infiltrating back into things….probably not likely, but you’re already in a precarious position, best to not add any possible complications.

 

  But it is running and if I down it, one of our servers stops working.  I won’t go into more details but suffice to say, I need to get that DC out of the domain, clean up AD and then promote a new one.  So here is my plan that if all goes well, there will be no restore from backup:

  • down the problem DC
  • delete all the references to it in AD manually
  • AD synching seems to be somewhat stopped - repadmin says the DCs can connect but replication stopped due to errors
  • issue commands to “force” replication between the good DCs
  • hopefully all is well

Question: having said the above, if I just do the above, will the bad DC object just come back from one of the other DCs?  Do I need to delete it on all DCs manually before forcing the replication?  or should I maybe down all DCs except one, make the deletion, set that DC as authoratative (via the registry setting I have read about), reboot that DC (I assume I need a reboot for the setting to take affect) and then bring up the other 2 DCs and force a replicate?  comments?

 

You plan looks very similar to what I would do.  But I don’t think I’d try and delete AD data from existing DC’s and reuse them.  I’d just build new.  I suppose you could forcibly remove AD from a DC and reuse it, by why take the risk?  As previously noted, they’re practically throwaway anyway.

Obviously, I can’t make a good call without knowing details on the number of DC’s you have, how many are local vs remote sites, how many remote sites, the root cause of the problem, etc.  And clearly we’ve gone well beyond the scope of restoring DC’s in Veeam.  If you know the root cause of the problem and have resolved it, and you know that AD data is valid as of a certain date, and that you know that the remote site DC’s can either be restored non-authoratively and sync with the DC that is authoratively restored, then I suppose you could resuse them.  But I’d be hesitant to do so in a more complex environment where changes of a rogue DC could cause wrinkles and you have to start the process over again.  I don’t know enough of your situation to make a call, but it sounds more complex than I’d want to take the risk for, and would rather take a safe route and restore the most likely good DC, validate it (offline if I must), and then blow away the others and join them to this one.

Userlevel 3

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Yeah, I shudder at the thought of restoring all DC’s.  I’d honestly consider shutting down all of the DC’s, restoring one authoratively, and then cleaning up AD and building new DC’s for any remote systems.  Building a DC is practically throw-away anymore, especially since you don’t have to do metadata cleanups anymore.

I have thought of that too - just restoring one DC, cleaning it up and then building others from that.  The DC with most likely the latest changes on it (password changes, trust relationships) though does not hold the FSMO roles.  So if I started with this, I would have to seize all the FSMO roles.

If I start with the server with the FSMO roles, it is at their head office but because most of their users log into VDI desktops at their data center, most likely there will be more broken trust relationships.

Anyhow, hopefully it does not come to this.  Here is my plan (for what it’s worth); I started this thread to get an understanding about recovering via Veeam backups but I am actually just going to try to get AD working again across sites.  As a reminder, the problem is there was a new DC being built that has become corrupted (not accessible) and the computer object in AD looks “messed up”.  No way to get into the DC to demote it and clean up.  But it is running and if I down it, one of our servers stops working.  I won’t go into more details but suffice to say, I need to get that DC out of the domain, clean up AD and then promote a new one.  So here is my plan that if all goes well, there will be no restore from backup:

  • down the problem DC
  • delete all the references to it in AD manually
  • AD synching seems to be somewhat stopped - repadmin says the DCs can connect but replication stopped due to errors
  • issue commands to “force” replication between the good DCs
  • hopefully all is well

Question: having said the above, if I just do the above, will the bad DC object just come back from one of the other DCs?  Do I need to delete it on all DCs manually before forcing the replication?  or should I maybe down all DCs except one, make the deletion, set that DC as authoratative (via the registry setting I have read about), reboot that DC (I assume I need a reboot for the setting to take affect) and then bring up the other 2 DCs and force a replicate?  comments?

Userlevel 7
Badge +6

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Yeah, I shudder at the thought of restoring all DC’s.  I’d honestly consider shutting down all of the DC’s, restoring one authoratively, and then cleaning up AD and building new DC’s for any remote systems.  Building a DC is practically throw-away anymore, especially since you don’t have to do metadata cleanups anymore.

Userlevel 7
Badge +7

Really nice topic which mix Veeam and AD knowledge. By luck I never had to restore all DCs of the infrastructure. But I ll keep all this precious advices in a corner. 

Userlevel 7
Badge +13

I’ve never reset a DSRM password….I imagine it can be done but I’ve never checked. 

https://www.dell.com/support/kbdoc/it-it/000136611/resetting-the-directory-services-restore-mode-administrator-password

Userlevel 7
Badge +6

More questions from excellent discussion above:

-netdom resetpwd - so I have a server or user PC that has lost its trust relationship: am I going to be able to log into that server to perform this command?  and if I do, say my server is named DC1 and my domain admin account is SkinnyAdmin, would the command be:

netdom resetpwd /s:DC1 /ud:mydomain.com\SkinnyAdmin /pd:bigFatPassword

-so I assume this from what I have read, since this is a machine password, gets a new generated password from the DC and updates that locally and also in that computer’s object in AD? just checking

-and I assume that this resets the trust relationship at the same time

DSRM password: I have taken over admin of a client’s  network: although I have the administrator passwords, I have not found any DSRM passwords recorded anywhere; I can guess by a list of commonly used passwords but cannot be sure; is that going to prevent a Veeam restore or does it just use the domain admin passwords stored in its credentials setup?  I know someone pasted a link to resetting the DSRM pwd but I would assume that only applies going forward, not going backwards when I do not have access to it.

 

You have to log into the machine with a local password.  You MIGHT be able to log into the machine with a cached domain password if the machine is disconnected from the network.  Then connect to the network so that it can contact the DC.  It then authenticates to the DC with your admin credentials and creates and syncs a new machine password with AD.  The Netdom and Reset-ComputerMachinePassword PowerShell cmdlet performs the same basic functions here.  Once both the workstation and AD are in sync with the passwork, the trust relationship is validated.  I’m not sure where the password is generated...I’ve always assumed that the workstation generates the password and tell’s AD what it is using your AD creds. 

Note that this is going to be for member servers and workstations.  DC’s don’t have a local admin password and should always be able to communicate with the copy of AD it’s hosting on itself.  If a DC becomes disconnected from the rest of AD due to sync issues (firewall, VPN, etc), and it’s been too long, it can become tombstoned and can no longer safely synch with AD. 

As for your command syntax, you can type in the /pd:bigFatPassword if you need to script things.  That said, to type it realtime (prompted) remove the bigFatPassword and replace it with *.  Using /pd:* will prompt you for the password so that you don’t have a domain admin password sitting on a script or note or whatever somewhere.  I don’t believe you have to use the /s:DC1 to specify a server name…..omitting it should cause it to connect to any DC it can talk to.

 

I’ve never reset a DSRM password….I imagine it can be done but I’ve never checked.  Of course, if that was the case, I assume you’d need to have valid domain access to the DC to do so.

Userlevel 3

More questions from excellent discussion above:

-netdom resetpwd - so I have a server or user PC that has lost its trust relationship: am I going to be able to log into that server to perform this command?  and if I do, say my server is named DC1 and my domain admin account is SkinnyAdmin, would the command be:

netdom resetpwd /s:DC1 /ud:mydomain.com\SkinnyAdmin /pd:bigFatPassword

-so I assume this from what I have read, since this is a machine password, gets a new generated password from the DC and updates that locally and also in that computer’s object in AD? just checking

-and I assume that this resets the trust relationship at the same time

DSRM password: I have taken over admin of a client’s  network: although I have the administrator passwords, I have not found any DSRM passwords recorded anywhere; I can guess by a list of commonly used passwords but cannot be sure; is that going to prevent a Veeam restore or does it just use the domain admin passwords stored in its credentials setup?  I know someone pasted a link to resetting the DSRM pwd but I would assume that only applies going forward, not going backwards when I do not have access to it.

Userlevel 3

hopefully Veeam support is faster than the usual response - but maybe if this is severity 1 I will get someone right away - going to try this on a long weekend starting Friday

-so in short, I power down all 3 DCs, restore the one which I think has the best data, issue a command at the command line or a registry setting to make this the authoratative server, then restore the other 2 servers right?

Scott wrote:

  1. Call Veeeam support for help if you need
  2. The easiest way for me would be to power down ALL of the DC’s and restore the master as authoritative. I’d then restore the other/rest and let them sync.

Comment