Knowing how AD replication works in Windows 2000 can help you tune it for optimal system performance.

Active Directory Answers: Active Directory Updating

Knowing how AD replication works in Windows 2000 can help you tune it for optimal system performance.

Consider this scenario: You’re running Windows 2000. You sit down at a domain controller within an Active Directory site and create a new user account. In another building, the new user logs on to a computer and is immediately authenticated by a different domain controller within the AD site. How did it happen? The answer is AD replication.

In order for all domain controllers in an Active Directory site to continually have the same database information, there must be replication among them. Each domain controller in the AD network maintains its own AD database. Without replication, each domain controller’s database copy would quickly become a hopeless collection of inaccurate data.

AD has two types of replication: intra-site and inter-site. Intra-site replication occurs within a site while inter-site replication occurs between sites. The two types are different animals, and this article explores AD replication intra-site replication — how it’s created and how it works.

The Concepts To Understand
Before getting into AD replication, let’s make sure you’re up to speed with basic AD concepts. Active Directory is Microsoft’s answer to distributed networking. AD provides cohesiveness to a distributed network by storing information about network resources and making those resources easy for users to find. All resources stored in AD are called objects. User accounts, group accounts, computer accounts, shared folders, printers, and all other resources are AD objects. For each object, there’s a set of AD attributes. An attribute helps define an object.

For example, a user account may have attributes like username, password, email address, telephone number, and so on. Resources are organized on a domain basis, often using the Organizational Units (OUs) that are new to Win2K. Domains and OUs give you a logical view of the network. Sites, on the other hand, are used by AD to manage replication and user traffic over often more expensive and unreliable network WAN links. (see Figure 1).

Figure 1. Domains are used to group resources logically, often using Organizational Units. By contrast, sites are physical groupings. Domains within a site typically share fast, inexpensive network connections.

By definition, a domain is a logical grouping of resources, which serves as a security and administrative boundary. A site, on the other hand, is a physical grouping. Sites can contain multiple domains and are built on inexpensive and fast network connections. A Win2K site can be built on one or more IP subnets. The network connections can be either inexpensive LAN technologies or a high-speed backbone.

AD uses site information to configure AD replication, so the importance of planning your sites can’t be overstated. As you’ll see in the remainder of this article, AD builds its own intra-site topology and that topology assumes your sites have fast, inexpensive bandwidth. When you plan AD sites, closely examine the sites’ available bandwidth. A collection of subnets without fast, inexpensive bandwidth shouldn’t be configured as one site. Within a site, AD assumes you have adequate bandwidth, which it will use freely—it assumes there’s plenty of it, and that it isn’t costly.

The Basics of Intra-site Replication
The process of updating AD information in Win2K is actually quite interesting. As I mentioned, when a change is made to the AD database on one domain controller, it must be made to other domain controllers in order to keep the information current. This is what we mean by intra-site replication. As you know, there are no longer PDCs and BDCs in Win2K networks. All domain controllers function as peers, and AD replication works in the same way.

There’s no single, master replicator. Multi-master replication is used, so all Win2K domain controllers are responsible for the replication of AD database information using IP remote procedure calls (RPCs). In terms of replication, the domain controllers function as peers, and each domain controller has a write-able copy of the AD database. This design alone provides replication fault tolerance. Because there isn’t a single master replicator, the failure of one domain controller within the AD environment doesn’t affect replication with other domain controllers. When a user or administrator makes a change to an AD object, the change is made on one of the domain controllers in the AD environment. After the change is made, all other domain controllers have outdated information, so that change must be replicated to all domain controllers. The domain controllers automatically handle this job, which is transparent to users and administrators.

The Challenge of Latency
There are two important points you should remember about intra-site replication. Replication within a site is typically frequent in order to reduce latency—the time delay that occurs when data between domain controllers isn’t accurate. For example, let’s say you create a new user account in a particular site. You create the account on a single domain controller, and now that account data must be replicated to all other domain controllers in the site. If the user tries to log on before the data is replicated across the site, the logon may fail because a domain controller that hasn’t received the replication data would refuse the user access to the network—even though the user actually has a valid account. This latency period is the time during which data is inaccurate across the site.

AD replication must work quickly to avoid as much latency as possible so that database information across the site is accurate. Because AD assumes that connections within a site are fast and inexpensive, intra-site replication occurs frequently, automatically, without any compression, and without a schedule. AD, in other words, chooses updated information over latency, since it assumes there’s plenty of bandwidth to use.

Inter-site replication is different. Since data in those cases must travel from site to site, frequently over expensive or unreliable WAN connections, replication schedules and managing latency becomes a much larger issue. Replication is always a tradeoff between latency and the expense of connections required for inter-site communication. For inter-site replication, you can use the Sites and Services tool to configure the frequency of replication, depending on your available bandwidth, and adjust the replication schedule to find a balance between bandwidth and latency for your network.

What’s the Effect?
You might logically wonder what happens to intra-site network bandwidth if replication occurs frequently and without a schedule. The full answer remains to be seen as Win2K is rolled out in large, distributed networks. Theoretically, however, replication traffic shouldn’t cause a bandwidth problem because replication occurs at the attribute level. For example, say you change a user account phone number. When that change is replicated, only the phone number attribute is replicated — not all the data for the entire object. With this approach, replication traffic should be minimal, although I’m not willing to bet my career on that just yet.

Replication Topology
Now that you understand multi-master replication, you might be wondering how to set it up for your site. Actually, you don’t need to — AD automatically creates its own replication topology within a site. This is done with the Knowledge Consistency Checker (KCC) service in AD. The KCC creates a topology, or a series of pathways, between domain controllers within the site, using replication partners. When AD is installed on the first domain controller, it creates a default site first. As domain controllers are installed and added to the site, the KCC determines how to include them in the replication map. Domain controllers receive replication data either directly from replication partners or transitively through indirect replication partners. Regardless of the relationship, AD always tries to create two pathways to every domain controller. That way, if one domain controller fails, the "loop" isn’t broken, and an alternative route can be used (see Figure 2).

Figure 2. The Knowledge Consistency Checker (KCC) service in Active Directory creates at least two pathways to every domain controller. If one DC fails, an alternative route can be used.

As your environment changes, for example, with the addition or removal of domain controllers, the KCC adjusts its topology to accommodate the change. The KCC can make adjustments dynamically as needed to ensure that replication can reach each domain controller in the site.

Optimal Performance
As an AD administrator, what do you need to configure? Actually, nothing. Since the KCC automatically generates and makes changes to the topology, AD takes care of intra-site topology for you. You can force replication to occur, although it really isn’t necessary. You can also tell AD to check its replication topology using the Sites and Services tool shown in Figure 3. Aside from such tasks, AD takes care of the replication topology by itself.

The key to optimal performance is to plan your network infrastructure carefully before deploying AD. A major portion of that planning process should be the examination of available bandwidth at each site. Remember that AD uses site information that you configure in the Sites and Services tool to determine how replication should occur. Within a site, AD assumes that fast and inexpensive bandwidth is available. If that assumption is incorrect, you need to back up and look at your site configuration.

Figure 3. You can check replication topology with the Sites and Services tool.

How AD Replication Works
So how do the domain controllers replicate information and keep up with each other’s database changes? All AD domain controllers in a domain are aware of each other’s presence due to the replication topology generated and managed by the KCC. Since they’re aware of each other, they simply have to make certain that replication data gets to each domain controller. This process begins with an "originating update."

For example, let’s say you change a user account’s password on a particular domain controller. All other domain controllers now have outdated information regarding the password, so the change must be replicated. The domain controller on which you made the password change issues an originating update to the other domain controllers. Depending on which change you make, a certain kind of originating update is issued:

  • Add — When you add a new object to AD, such as a user, group, printer, and so on, an Add originating update is issued.
  • Modify — When you modify an object, for example, when you change a user’s password, the Modify originating update is issued.
  • ModifyDN — When you change the name of an object or an object’s parent, or when you move an object into a new parent’s domain, the ModifyDN originating update is issued.
  • Delete — When you delete an object, the Delete originating update is issued.

Thus, when you make a change to the database, the domain controller on which the change was made issues a particular type of originating update to the other domain controllers. This originating update lets the other domain controllers know that changed data needs to be replicated The originating update becomes a replicated update on those domain controllers once the replication process is completed. (See Figure 4.)

Figure 4. When you make a change to a database, the domain controller on which the change was made issues an originating update so that replication can occur.

How do domain controllers know if the replicated changes are new? This is determined through the use of Update Sequence Numbers (USNs). Each domain controller has a USN table that contains a USN number for each attribute. When an attribute is changed on a domain controller, that attribute’s USN number is updated. Now, all other domain controllers have an outdated USN. When replication occurs, the change to the object is replicated with the new USN. All other domain controllers make this change to their databases and update their USN tables, so they’re accurate. USNs work well because they do away with the need for specific timestamps, although timestamps are still maintained in order to break replication ties. For example, let’s say an administrator changes a user’s password on one domain controller and a different administrator changes the same user’s password on a different one. AD will use the timestamp to break the tie between the two, with the latest timestamp "winning."

How AD Solves Replication Problems
AD replication is much more precise because USNs are primarily used instead of timestamps. Because replication uses USNs rather than timestamps, you don’t have to worry about precise time synchronization between domain controllers — a frustrating and frequently difficult configuration problem. However, as with any process, the potential for problems exists, and AD’s replication process is no exception. AD contains built-in mechanisms designed to solve certain kinds of replication problems when they occur. Let’s consider the two major ones — unnecessary replication and replication collisions.

As mentioned earlier, AD automatically creates a replication topology loop. This loop ensures that replication reaches all domain controllers in the site in a timely manner and that replication can continue if a domain controller fails. However, a potential problem with the loop could occur with unnecessary replication traffic, such as when a domain controller receives replication updates more than once. To make sure this doesn’t happen, AD uses a process called propagation dampening, which allows a domain controller to detect when replication has already reached another domain controller in the loop. When the domain controller detects this, the change won’t be replicated to that other domain controller (see Figure 5).

Figure 5. "Propagation dampening" allows a domain controller to detect when replication has already reached another DC in the loop, so that the change is not repeated.

Propagation dampening works by using two vectors — an Up-to-Date Vector and a High Watermark vector. Vectors are pairs of data that contain a globally unique identifier (GUID) and the USN. The Up-to-Date Vector is made up of server USN pairs held by each server containing the highest originating update received from each domain controller. In a like manner, the High Watermark Vector contains the highest attribute USN for any given object. By using both of these vectors, domain controllers can detect when replication has already reached another domain controller and then kill the replication. Without propagation dampening, replication could continue to flow around the loop over and over.

Another potential problem AD automatically resolves is replication collisions. Replication collisions occur when two different administrators make a change to the same attribute on the same object at different domain controllers. For example, two administrators change a user’s password at the same time on two different domain controllers. When those changes are replicated, there’s a replication collision. AD tries to reduce the number of collisions by replicating data on an attribute level rather than on an object level. For example, for a particular user account, one administrator might change the password while another admin changes the user’s phone number. The same object is being changed, but those changes affect different attributes. This doesn’t result in a collision.

In the event of a collision, AD detects and takes necessary steps to solve the collision problem. AD uses timestamps and version numbers to break the collision. Although timestamps have been replaced by USNs, they’re still maintained to resolve collisions. AD examines an attribute’s timestamp to see which update has the highest timestamp and also examines the attribute’s version number. Because each originating update of an attribute increases the version number, AD examines both. The replicated change with the highest numbers wins, and that attribute change is replicated. It’s extremely unlikely that the timestamp and version numbers would be the same, but in such a case, AD can also use the directory system agent’s (DSA) GUID to determine the final winner in the collision. As with other replication features, this process is invisible.

The Jury is Still Out
Although the final judgment on the effectiveness of AD’s intra-site replication process, both in terms of latency and network bandwidth, remains to be made, the process is effective from a design perspective. Because of AD’s ability to dynamically configure its own intra-site replication needs without human intervention, administrators can spend their time focusing on other network tasks — a design that benefits us all.

Featured