Restoring a Domain Controller

Restoring a Domain Controller

One of the benefits of Active Directory is built-in redundancy. When you lose a single domain controller, the impact can be insignificant. With many services, such as DHCP, the architecture dictates a dependency on a specific server. When that server becomes unavailable, clients are impacted. Over the years, failover or redundancy has been built into most of these services, including DHCP. With Active Directory, the architecture is built around redundancy. Clients are not dependent on a single DC; they can failover to another DC seamlessly if a failure occurs.

When a failure does occur, you should ask yourself several questions to assess the impact:

Is the domain controller the only one for the domain?

This is the worst-case scenario. The redundancy in Active Directory applies only if you have more than one domain controller in a domain. If there is only one, you have a single point of failure. You could irrevocably lose the domain unless you can get that domain controller back online or restore it from backup.

Does the domain controller have a FSMO role?

The five FSMO roles outlined in Chapter 2 play an important part in Active Directory. FSMO roles are not redundant, so if a FSMO role owner becomes unavailable, you'll need to seize the FSMO role on another domain controller. Check out the FSMO recovery section later in this chapter for more information.

Is the domain controller a Global Catalog server?

The Global Catalog is a function that any domain controller can perform if enabled. But if you have only one Global Catalog server in a site and it becomes unavailable, it can impact users' ability to login. As long as clients can access a Global Catalog, even if it isn't in the most optimal location, they will be able to login. If a site without a Global Catalog for some reason loses connectivity with the rest of the network, it would impact users' ability to login. With Windows Server 2003, you can enable universal group caching on a per-site basis to limit this potential issue, but only if the user is not using a userPrincipalName for authentication.

Is the domain controller necessary from a capacity perspective?

If your domain controllers are running near capacity and one fails, it could overwhelm the remaining servers. At this point, clients could start to experience login failures or extreme slowness when authenticating.

Are any other services, such as Exchange, relying on that specific domain controller?

Exchange is a heavy consumer of Active Directory Services, especially AD Global Catalogs. Failure of a domain controller that Exchange is using can cause considerable issues in the mail environment depending on the versions of the Outlook and Exchange being used. (More recent versions of Exchange and Outlook obviously handle outages better than older versions.) During the outage period, mail delivery could be impacted along with client lookups. Exchange is just one example, but it illustrates that you have to be careful of this when introducing Active Directory-enabled services into your environment.

These questions can help you assess the urgency of restoring the domain controller. If you answered "no" to all of the questions, the domain controller can stay down for a short period without significant impact.

When you've identified that you need to restore a domain controller, there are two options to choose from: restoring from replication or restoring from a backup.

Restore from Replication

One option for restoring a domain controller is to bring up a freshly installed or repaired machine and promote it into Active Directory. You would use this option if you had a single domain controller failure due to hardware and either did not have a recent backup of the machine or you didn't want to go through the process of restoring the DC from a backup. This method allows you to replace the server in AD by promoting a newly installed machine and allowing replication to copy all of the data to the DC. Here are the steps to perform this type of restore:

  1. Remove the failed DC from AD. The old remnants of the domain controller must be removed from Active Directory before you promote the freshly installed server. We describe the exact steps to do this shortly.

  2. Rebuild OS. Reinstall the operating system and any other applications you support on your domain controllers .

  3. Promote server. After you've allowed time for the DC removal process to replicate throughout the forest, you can then promote the new server into AD.

  4. Configure any necessary roles. If the failed server had any FSMO roles or was a GC, you can configure the new server to have these roles.

A best practice is to keep a spare server that already has the OS and any other software installed ready to ship or onsite at all locations. That way, if you have a major failure with one of your domain controllers, you can use the spare server without needing to stress over getting the hardware replaced immediately in the failed machine. Alternatively, just have additional domain controller capacity in the primary sites that failures would be most painful for, especially for Exchange.

The biggest potential drawback with this method is the restore time. Depending on the size of your DIT file and how fast your network connections are between the new DC and the server it will replicate with, the restore time could be several hours or even days. Restore time can be dramatically reduced with a new option in Windows Server 2003, called Restore from Media. It allows you to take files from a system state backup from one domain controller and use them to quickly promote another domain controller. It may possibly be faster to copy these backup files over the network to the remote site or ship the files on some other media to the site versus trying to replicate the entire DIT over the WAN. If this is problematic or too slow for you, you'll want to look at the restore from backup option that we describe next.

Manually removing a domain controller from Active Directory

One of the key steps with the restore from replication method is removing the objects that are associated with the domain controller before it gets added to AD again. This is a three-step process. The first step is to remove the associated metadata. That can be accomplished with the ntdsutil utility. The following example shows the commands necessary to remove the DC3 domain controller, which is in the RTP site, from the domain:

    ntdsutil: metadata cleanup
    metadata cleanup: connections

Next, we need to connect to an existing domain controller in the domain that contains the domain controller you want to remove. In this case, we connect to DC2:

    server connections: connect to server dc2
    Binding to dc2 ...
    Connected to dc2 using credentials of locally logged on user.
    server connections: quit
    metadata cleanup: select operation target

Now we need to select the domain the domain controller is in. In this case, it is

    select operation target: list domains
    Found 2 domain(s)
    0 - DC=mycorp,DC=com
    1 - DC=emea,DC=mycorp,DC=com
    select operation target: select domain 1
    No current site
    Domain - DC=emea,DC=mycorp,DC=com
    No current server
    No current Naming Context

Next we must select the site the domain controller is in. In this case, it is the RTP site:

    select operation target: list sites
    Found 4 site(s)
    0 - CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=mycorp,DC=com
    1 - CN=RTP,CN=Sites,CN=Configuration,DC=mycorp,DC=com
    2 - CN=SJC,CN=Sites,CN=Configuration,DC=mycorp,DC=com
    3 - CN=NYC,CN=Sites,CN=Configuration,DC=mycorp,DC=com
    select operation target: select site 1
    Site - CN=RTP,CN=Sites,CN=Configuration,DC=mycorp,DC=com
    Domain - DC=emea,DC=mycorp,DC=com
    No current server
    No current Naming Context

After listing the servers in the site, we must select the server we want to remove. In this case, it is DC3:

    select operation target: list servers in site
    Found 3 server(s)
    0 - CN=DC1,CN=Servers,CN=RTP,CN=Sites,CN=Configuration,DC=mycorp,DC=com
    1 - CN=DC2,CN=Servers,CN=RTP,CN=Sites,CN=Configuration,DC=mycorp,DC=com
    2 - CN=DC3,CN=Servers,CN=RTP,CN=Sites,CN=Configuration,DC=mycorp,DC=com
    select operation target: select server 2
    Site - CN=RTP,CN=Sites,CN=Configuration, DC=mycorp,DC=com
    Domain - DC=emea,DC=mycorp,DC=com
    Server - CN=DC3,CN=Servers,CN=RTP,CN=Sites,CN=Configuration,DC=mycorp,DC=com
       DSA object - CN=NTDS Settings,CN=DC3,CN=Servers,CN=RTP,CN=Sites,
       Computer object - CN=DC3,OU=Domain Controllers,DC=emea,DC=mycorp,DC=com
    No current Naming Context
    select operation target: quit

This process has been considerably simplified in Windows Server 2003 Service Pack 1; however, you need to know the distinguishedName of the Domain Controller's server object in the configuration container. It is recommended that you simply follow the preceding directions for removing dead domain controllers, as there is less possibility of a mistake.

The last step removes the metadata for the selected domain controller:

    metadata cleanup: remove selected server

At this point, you should receive confirmation that the DC was removed successfully. If you receive an error that the object could not be found, it might have already been removed if you tried to demote the server with dcpromo.

You will then need to manually remove a few more objects from Active Directory, including the computer account and FRS object in the domain-naming context and the server object in the configuration container. See MS Knowledge Base article 216498 for details.

Restore from Backup

Another option to reestablish a failed domain controller is to restore the machine using a backup. This approach does not require you to remove any objects from Active Directory. When you restore a DC from a backup, the latest changes will replicate to make it current. If time is of the essence and the backup file is immediately available, this will be the quicker approach, because only the latest changes since the last backup, instead of the whole directory tree, will be replicated over the network.

Here are the steps to restore from backup:

  1. Rebuild OS. Reinstall the operating system and any other applications you support on your domain controllers. Leave the server as a standalone or member server.

  2. Restore from backup. Use your backup packagee.g., NT Backupto restore at least the System State onto the machine. In the next section, we will walk through the NT Backup utility to show how this is done.

  3. Reboot server and allow replication to complete. If the failed server had any FSMO roles or was a GC, you can configure the new server to have these roles.

It is also possible to restore the backup of a machine onto a machine that has different hardware. Here are some issues to be aware of when doing so:

  • The number of drives and drive letters should be the same.

  • The disk drive controller and configuration should be the same.

  • The attached cards, such as network cards, video adapter, and processors, should be the same. After the restore, you can install the new cards, which should be recognized by Plug and Play.

  • The boot.ini from the failed machine will be restored, which may not be compatible with the new hardware, so you'll need to make any necessary changes.

  • If the HAL is different between machines, you can run into problems. For example, if the failed machine was single processor and the new machine is multiprocessor, you will have a compatibility problem. The only workaround is to copy the Hal.dll, which is not included as part of System State, from the old machine and put it on the new machine. The obvious drawback to this is it will make the new multiprocessor machine act like a single processor machine.

Because there are numerous things that can go wrong with restoring to different hardware, we highly suggest you test and document the process thoroughly; refer to MS Knowledge Base article 263532. The last thing you want to do is troubleshoot hardware compatibility issues when you are trying to restore a crucial domain controller.

 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows