Since AD Recovery is not a routine operation performed daily, many of us lack experience with it. This is evident in the article below.
Recently, my lab environments have been repeatedly restarting due to a faulty switch. After extensive research to resolve this issue before performing VM recovery, I discovered that the Hyper-V host was experiencing unplanned restarts. This can cause the virtual hard disks connected to a virtual IDE controller to become inconsistent if they are being used by virtual machines.
According to Microsoft, when you virtualise your domain controller (DC) on a Hyper-V host server, a crash or power outage on the Hyper-V host can lead to several issues. The Active Directory database may become corrupted, or the virtual machine may fail to start, resulting in an error message like the one shown below.
Troubleshoot via DSRM
DSRM (Directory Services Repair Mode or Directory Services Restore Mode in versions prior to Windows Server 2012) is a special boot mode for Windows Server domain controllers. It functions similarly to Safe Mode with Networking but does not run Active Directory. Administrators use DSRM to restore Active Directory from a backup. It also helps resolve various issues with AD.
In later versions of Windows Servers, use the Advanced Boot Options menu or the Windows Recovery Environment to access DSRM as shown below. Under Choose an Option, select Troubleshoot
Select “Advanced Options” as shown below
On the Startup Settings, please select Restart.
Select the Directory Services Repair Mode (DSRM), and then log in with the DSRM account. here is the link to the blogpost.
Verify AD Replication
Active Directory replication issues might cause inconsistencies or make the domain controller unavailable. Use the following command to determine the replication status and summary
repadmin /replsummary
repadmin /showrepl
As you can see in the images above, we have gotten two errors from the AD Replication Commands “Win32 Error 1355(0x54b) – the specified domain does not exist. Learn more about the System Error Codes (1300-1699). Let us also run the netdom verify command as shown below.
Perform AD Database Integrity
After reviewing the Event Viewer, i decided to perform the initial integrity check, launch PowerShell or Command Prompt and type the following command:
ESENTUTL /g C:\windows\NTDS\ntds.dit /!10240 /8 /o
As you can see, there are inconsistency in the database with error message -1811. This is because of the unplanned shutdown. That is “the Administrator modified logs or lost I/O flush on shutdown”.
Repair NTDS database
Now let us attempt repair the ntds database. But let us perform the integrity check once more using the NTDS commands.
ntdsutil.exe
activate instance ntds
files
integrity
As you can see this, this failed with a new error -501 JET_errLogFileCorrupt. This error is because of the Hardware corrupting the I/O at writing, or the hardware lost flush caused the log to become unusable. This means that the database (DB) is left in a corrupted state.
Solution: Perform VM Recovery
Microsoft recommends restoring the database from a known good backup or reinstalling the domain controller (DC). In my case, I have a backup of the entire VM, so I will restore it. You can learn about the different recovery options here: ttps://helpcenter.veeam.com/docs/backup/vsphere/vm_restores.html?ver=120
This time, I will perform “Instant Recovery” which instantly recover workloads (VMs, EC2 instances, physical servers and so on) directly from compressed and deduplicated backup files as HyperV VM. When you perform Instant Recovery, Veeam Backup & Replication mounts recovered VM images to a host directly from backups stored on backup repositories.
Instant Recovery improves recovery time objectives (RTO) and minimises disruption and downtime of production workloads. However, Instant Recovery offers “temporary spares” for VMs with limited I/O performance.
To give the recovered VMs full I/O performance, you must finalize Instant Recovery by migrating the recovered VMs to the production environment.
VM is available again and useable
There are no longer replication errors. By the way, this is the only DC in this domain at the moment.