April 2, 2011, 12:28 p.m.
posted by vdv
It helps to manage Active Directory replication if you have a road map of how the domain controllers connect to each other and what information they exchange. In this section, we'll take a look at what Active Directory components get replicated, where the replication traffic goes, how that traffic is managed, and what happens when conflicting updates collide with each other.
Replication and Naming Contexts
In addition, Global Catalog servers host partial naming contexts for domains other than their own. You can also create Application naming contexts for holding DNS zone objects and place those naming contexts on domain controllers running DNS. Figure shows a three-domain forest and the naming contexts that would be found on a Global Catalog server in one of those domains.
As you build a mental image of replication, keep in mind that each naming context constitutes a separate replication unit. Domain controllers must propagate changes made to their replica of a naming context out to other domain controllers hosting a replica of the same naming context.
The service responsible for handling replication between two domain controllers is the Directory Replication Agent, or DRA. The DRA depends on the Connection objects in the topology map to determine which partners to contact when replicating updates to a naming context.
Connection objects define inbound replication paths. Domain controllers pull updates from their partners. When a domain controller needs to update its copy of a naming context, the DRA sends a replication request to its partners. The DRAs on the partners respond by assembling a replication packet containing updates to the naming context then delivering the packet to the requesting partner.
This replication packet varies in size depending on the memory in the domain controller. The packet size is 1/100 of the amount of RAM. For this reason, it is advantageous to add memory to a DC. A heavily loaded DC would also benefit from a second processor.
Watching domain controllers select their partners is like watching teenagers pick seats in the cafeteria at lunchtime. The DRA prefers to use a single Connection object to define the end points for all the naming contexts hosted by a domain controller. For this reason, domain controllers prefer to replicate with other domain controllers in their own domain. If necessary, a domain controller will replicate its Configuration and Schema naming contexts with one partner and its Domain naming context with another partner, but only if no other options are available.
Global Catalog servers have a special challenge when selecting replication partners. GC servers need a partial replica of every Domain naming context. They can replicate the partial naming context replicas from another GC or directly from domain controllers in the source domain. Keep this behavior in mind as you lay out your architecture. Make sure that GC servers can link to other GC servers to prevent a server from snaking out links to multiple domain controllers in other domains.
The Exchange directory service from which Active Directory was derived takes the approach of replicating an entire object when any property of that object changes. This makes for a simple replication mechanism, because the DRA simply copies an entire row out of the table holding the object's information. Replicating entire objects abuses the network with unnecessary traffic, though, and complicates the collision handling mechanism if conflicting changes are made in the same replication interval.
The Active Directory engine replicates individual properties rather than entire objects. This conserves bandwidth at the expense of a little added complexity. It's more difficult to ensure database consistency with lots of individual properties flying around the network.
To help control property replication, each property contains a set of information that defines when the property was last modified, where the modification originated, and how many total revisions have been applied to the property. This is called the property metadata. The metadata is stored right along with the property's primary value, such as Name or CN or Department. See the "Property Metadata" section later in this chapter.
A site is usually a LAN or MAN. It can also be a campus network if you have sufficient bandwidth between buildings. You should have at least 500Kbps of bandwidth to support full speed replication within a site. See the "Measuring Link Performance" sidebar. Even if the links are fast, though, if they regularly become oversubscribed or demonstrate long periods of high latency, you may experience replication problems if you do not define separate sites.
Active Directory uses a loosely coupled replication mechanism. This means an interval of some duration exists between the time a modification is made to a property in one replica and the time the modified property appears in all replicas. During this interval, an LDAP query to one domain controller could produce a different result than the same query submitted to another domain controller. Keep this behavior in mind when troubleshooting problems.
The time it takes for a modified property to replicate to all domain controllers is called convergence time. Ideally, changes would propagate nearly instantaneously so that convergence time would be zero. That ideal cannot be obtained in a practical network. Convergence time is always a compromise between low network traffic and fast update propagation. Active Directory uses two methods for controlling convergence time: notification and polling.
When a domain controller modifies a property in one of its naming contexts, it notifies its replication partners within a site that a change has been made. The partners then pull a copy of the changed property and apply it to their naming context replica. Those domain controllers, in turn, notify their own replication partners and the change propagates in stages around the site.
Short notification intervals will propagate changes more quickly than long intervals, but generate more traffic to carry the same amount of information. (Each replication packet is smaller.) The default notification interval is 15 seconds.
Notification is only used between domain controllers in the same site. Replication between bridgehead servers in different sites uses polling only, not notification. This permits the system to accumulate sufficient changes (more than 50KB) to warrant compression.
The polling interval between domain controllers in the same site is set to 1 hour. This intra-site polling is not intended to propagate changes. It simply acts as a status check to ensure that the replication partner is available in the event that no Active Directory changes are made during that hour.
The default polling interval between domain controllers in different sites is set to 180 minutes, or 3 hours. This is a long time to wait for updates to propagate. You can set it to a shorter interval.
Keep these replication intervals in mind. You'll use the numbers over and over as you set up your sites and configure replication parameters. They also affect daily operation. For example, a Help Desk technician responsible for changing group members needs to remember that a change made to a user's group membership could take three hours (or longer) to replicate to the site containing the user who was just added to the group.
Urgent replication items are propagated between sites using the normal polling frequency. You can enable notification between sites but this is not recommended. See "Controlling Replication Parameters" for details.
Most communication between network entities uses an application-layer protocol. For instance, when a Windows network client copies a file from a Windows server, it uses the Server Message Block (SMB) protocol. When an Internet email client wants to send a message to a post office, it uses Simple Mail Transport Protocol (SMTP). Active Directory replication can use one of two high-level protocols.
Remote Procedure Calls
The primary protocol used by Active Directory replication is the Remote Procedure Call, or RPC. RPC transactions are simple to code and have a robust set of tools for creating and managing a connection. RPCs are especially attractive for Active Directory replication because they have a straightforward encryption methodology. Encryption is an essential component of replication. You do not want someone with a packet sniffer to view sensitive directory information as it transits the network.
In an RPC transaction, an RPC client issues a function call to the complementary RPC server without much regard for the state of the intervening network. This greatly simplifies the way applications are coded. On the other hand, the application can get impatient if it waits too long for a response. This can cause a loss of connection if the client gives up.
Here's the bottom line: RPCs make for a great data communication tool but they are finicky over wide area connections. For this reason, Active Directory uses two forms of RPC: a high-speed form for use in a local network and a low-speed form for use across a WAN. The low-speed form has higher latency (longer timeouts) and will suffer through multiple connection losses before giving up.
Active Directory can also use Simple Mail Transport Protocol (SMTP) for transferring replication packets. SMTP is a robust protocol, well suited for use across uncertain network connections. SMTP also permits asynchronous communication, making it possible to transfer replication packets in bulk.
Unfortunately, SMTP has a couple of serious drawbacks when it comes to Active Directory replication. The first is structural. SMTP transfers messages in clear text. For this reason, the system automatically encrypts SMTP messages using a proprietary form of secure messaging. This form of encryption uses certificates, so you must have a Certification Authority. Encryption puts a significant load on a server, so ensure that the bridgeheads are especially fast with multiple processors to share the workload.
The second drawback of using SMTP is a limitation of the File Replication Service (FRS). Recall that FRS is used to sync the contents of Sysvol between domain controllers. FRS can only use RPCs to carry replication traffic. In addition, FRS uses the same replication topology (including the same connection options) as those specified for Active Directory replication, so you cannot specify one transport for Active Directory replication and another for FRS.
Because of this limitation, SMTP cannot be used to replicate the contents of a Domain naming context because the contents of Sysvol cannot be kept in sync. SMTP can be used for all other naming contexts, including the Configuration, Schema, and Application naming contexts and the partial naming contexts that make up the Global Catalog.
Domain controllers know each other's location and the connections between them. The LDAP term for this topology information is knowledge. The service responsible for tailoring the replication topology is the Knowledge Consistency Checker, or KCC.
The KCC treats the domain controller topology like a game of K*Nex. Every 15 minutes, it surveys the domain controllers in the domain and decides where to place Connection objects so that each domain controller gets its updates in a reasonable amount of time. The KCC on a bridgehead server includes the bridgehead servers in other sites in its calculations.
The KCC makes its decisions based on a spanning tree algorithm. One of the improvements made in Windows Server 2003 is a streamlining of this algorithm that enables the KCC to handle more sites and larger topologies. In Windows 2000, there was a limit of approximately 100 sites and domain controllers before an administrator would be forced to intervene and create manual connections. Using Windows Server 2003, a much larger number of sites and domain controllers are supported. Microsoft has not specified a limit.
When it comes to selecting replication partners, the Exchange directory service behaves like a sailor on a 24-hour pass. It creates point-to-point replication connections between every domain controllers in a site. The Active Directory KCC is much more discriminating. It selects a limited number of partners to structure a tightly controlled topology. For intra-site replication, the KCC builds a replica ring. See Figure for an example.
When constructing a replica ring, the KCC follows a 3-hop rule: no domain controller is more than 3 hops from any other domain controller. Recall that a domain controller can wait up to 15 seconds to notify its replication partners following a change to one of its naming contexts. By limiting the hop count, the KCC ensures that changes converge quickly.
Replica Ring Formation
When a new domain controller is promoted, the KCC on that domain controller gets a copy of Active Directory in much the same way that aliens invading Earth get the name and location of the White House. They land furtively and use a slimy tendril to suck the brains out of an innocent human being who wasn't doing them any harm at all. (Excuse the emotion. I was born and raised in Roswell, where we're a little sensitive about this sort of treatment.)
During a domain controller promotion, the Active Directory Promotion Wizard creates a connection to an existing domain controller then uses that connection to pull a full copy of Active Directory. When the next KCC on the existing domain controller runs (sometime in the next 15 minutes), it sees the new connection and builds a complementary connection to the new domain controller. They are now full-fledged replication partners.
The KCCs on the other domain controllers take note of these changes and proceed to break and make their own connections to insert the new domain controller into the replica ring. This happens without any administrative intervention.
If the ring gets more than six domain controllers, such as that in Figure, the KCC running on each domain controller realizes that there are more than three hops in the ring. It sets to work building optimizing connections between domain controllers to reduce the hop count. Remember that the domain controllers share common knowledge about connections, so they eventually work out a mutually agreeable topology.
Replica Ring Repair
If a domain controller does not respond to a replication request, the DRA wakes up the KCC. The KCC takes over and builds new connections to bypass the failed domain controller, like a heart muscle healing itself after a heart attack.
The DRA keeps trying to contact the lost domain controller. When the domain controller comes back online again, the KCC sets to work restructuring the connections to reintroduce it back into the ring.
Under normal circumstances, all this repair work happens automatically. The only time an administrator should need to do any manual configuration is in the event that the KCC is unable to find a suitable replication partner due to a Domain Name System (DNS) failure. This generally occurs when a failed domain controller is also the DNS server for a site. If you always specify multiple DNS servers in your TCP/IP configuration, you should avoid this problem.
The replication picture changes considerably when the domain controllers are in different sites. Let's consider for a moment what would happen if there were such thing as inter-site replication. Figure shows what this would look like.
In this configuration, the Directory Replication Agents running on the domain controllers have no way of knowing that the intervening network connections are slow and prone to oversubscription and potential failure. They blithely replicate as fast and as often as they would for normal network connections.
That's when trouble begins. The high-speed RPC connections begin to fail when the WAN links become oversubscribed and latency increases. The symptoms of RPC failures include persistent differences between replicas, DRA and KCC errors in the Event log, and eventually fatal RPC end-point errors when the connections fail repeatedly.
Active Directory avoids this carnage by building connections between sites that use special, low-speed RPCs. For this reason, inter-site replication uses an entirely different topology. See Figure for an example.
Inter-Site Replication Compared to Intra-Site Replication
Several features differentiate inter-site replication topology from its intra-site cousin:
Bridgehead Server Selection
Inter-Site Topology Generator
The bridgehead selection is something of a secret in that domain controllers in other sites don't know the results until they are told, something like waiting for the College of Cardinals to select a pope.
Rather than watching for the color of the smoke from the chimney of the Sistine Chapel, the sites wait for a Connection object between the bridgeheads to appear in the Configuration naming context. This Connection object is created by a domain controller designated as the Inter-Site Topology Generator, or ISTG.
There is only one ISTG in a site. It is selected using the same criteria as the bridgehead server—that is, the domain controller with the highest GUID. For this reason, the ISTG is often a bridgehead server, but it doesn't have to be. For instance, the ISTG might not be on the list of preferred bridgehead servers.
The ISTG runs as a separate function from the KCC because one site can have more than one bridgehead if there are multiple domains. Each of these bridgeheads has a copy of the Schema and Configuration naming contexts and may have a copy of the Global Catalog partial naming contexts, as well. Inter-site replication would turn into anarchy if all those bridgeheads made independent decisions about where to create Connection objects.
Inter-Site Topology Highlights
If you're experiencing a little anarchy of your own right now in trying to construct a mental picture of all this, here are some highlights (refer to Figure)
Failure of a Bridgehead or ISTG
If a bridgehead fails, its partners in other sites will be unable to complete replication transactions. The Directory Replication Agents on the bridgehead's local replication partners will notice that the bridgehead has stopped responding. They snitch to the KCC, which sets to work selecting a replacement. The KCC waits a period of time (two hours by default) before transferring responsibility to the new bridgehead
If an administrator has selected a set of preferred bridgehead servers and none of these servers is available, the KCC will not select a replacement bridgehead and inter-site replication will fail. For this reason, it is very important that you select multiple preferred bridgehead servers for each Domain naming context.
If a failed bridgehead comes back on line, it does not reassume its old responsibilities. It gets in line as a candidate for replacing the new bridgehead should the new bridgehead ever fail.
Detecting an ISTG failure is a little trickier. The ISTG is like an emeritus professor; it only shows up at ceremonial occasions and funerals. To make sure everyone knows it's still alive, the ISTG periodically updates an attribute called Inter-siteTopologyGenerator in its NTDS Settings object. By default, it does this update every 30 minutes. The update replicates to the rest of the domain controllers so they know the ISTG is on the line. If an hour passes without this attribute being updated, the KCC on the other domain controllers select a new ISTG using the highest GUID rule.
Site Objects in Active Directory
Active Directory stores the objects that control replication under the Sites container in the Configuration naming context. Because every domain controller hosts a copy of the Configuration naming context, every domain controller has the same information about site names, locations, and connections. This is how the KCC services on separate domain controllers all come to the same conclusion about replication topology. They all work from the same crib sheet.
Figure shows how Active Directory objects represent the various components of a replication topology. Here is a list of the objects and their functions:
The "Configuring Inter-site Replication" section later in this chapter describes how these objects are used when configuring sites and inter-site replication.
Replication Topology Summary
Here are the important points to remember when you begin detailing your Active Directory site architecture: