Beneath the hype surrounding Active Directory, is the fact that any directory service is based on database technology. Understand this concept, and you’re closer to understanding Windows 2000.

What Active Directory Brings to NT

Beneath the hype surrounding Active Directory, is the fact that any directory service is based on database technology. Understand this concept, and you’re closer to understanding NT 5.0.

Even the most ardent Windows NT fan will admit that the product has always been weak in the area of directory services. Microsoft has countered this perception by calling its current registry-based domain administrative function a directory service (DS). With Novell and its Novell (renamed from NetWare) Directory Service (NDS) nipping at Microsoft, especially lately, the DS issue has remained in the forefront.

Novell has had a directory service since NetWare 4.x shipped in 1993; but the company has gradually squandered its first-to-market opportunity. Novell’s DS has been a useful administrative tool in terms of organizing users, applications, and other resources. However, developers haven’t taken the next step and pushed the DS down into the network infrastructure. Thus, Novell hasn’t coaxed the developer community to rally around its new flag.

The fact that Cisco chose to work first with Microsoft’s “future” DS, under the Directory Enabled Networks (DEN) initiative, rather than with NDS, illustrates Novell’s problem. The company’s next chance to build momentum is with its new NetWare 5.0, which shipped this summer.

With the eventual release of Windows NT 5.0, however, the directory services race-to-market could well be over. Although directory services will become increasingly important, who was first to market will be a moot point.

What is a Directory Service?

Most enterprises already have many different directories in place, including those for network operating systems, e-mail systems, and groupware products. Directory services attempt to offer an organized approach to managing all of the bits of data maintained in these existing directories.

In one whitepaper on the topic, Microsoft likens directory services to the white pages of a phone book. You use input, such as a person’s name, to get output—that person’s address and phone number. Likewise, DS offers the flexibility of the yellow pages. If the user enters general input (“where are the printers?”), a browsable list of available printer resources is shown.

Whether you call it Active Directory, as Microsoft does, or NDS, as Novell does, a directory service is a distributed database, pure and simple. The supposedly high-level issues surrounding directory services boil down to what and how information is stored, how you can get to that information, and what you can do with it. The advantages are three-fold: 1) A directory service offers a single entry point to the network for users; no more remembering multiple passwords; 2) users will be able to find and use network resources across the enterprise; 3) this design approach simplifies enterprise network administration. (Obviously, these advantages will be enjoyed in larger NT network environments; if you maintain a small network, the single domain will continue to be sufficient for your needs.)

Acronyms
ACL—Access Control List
BDC—Backup Domain Controller
CN—Common Name
DC—Domain Controller
DN—Distinguished Name
DNS—Domain Name Service
GUID—Global Unique Identifier
NDS—Novell Directory Service
PDC—Primary Domain Controller
PVN—Property Version Number
RDN—Relative Distinguished Names
RFC—Request for Comment
RPC—Remote Procedure Call
SMB—Service Message Block (protocol)
SRV RR—Service Resource Records
UDV—Up-to-date Vector
USN—Update Sequence Number

AD provides the ability to organize users, applications, and other resources in a hierarchical manner in the same fashion as NetWare. The hierarchy allows the administrator to scale the logical views of the network organization and application of permissions to enterprise-sized designs. It’s been sorely needed for some time now, and Active Directory will remove the negative DS comparisons with NetWare. While the high-level functionality of Active Directory (AD) is interesting and consequential, it’s important first to understand the fundamental architecture. You’ll then have a firm foundation when you’re ready to implement Active Directory with NT 5.0.

A Brief Description

AD extends the current NT 4.0 domain concept by allowing us to join resources together into a “domain tree.” For administrative purposes domains can be subdivided into Organizational Units (OUs). You can use OUs to get greater granularity for management of objects than domains. When you need to build a more complex organizational structure—or a very large number of objects (beyond a million) must be stored—you can join domains together to form a tree. A domain tree can contain many millions of objects.

AD objects are divided into two groups: leaf objects, and container objects. Whereas a container object can contain other objects, a leaf object can’t. Standard Container Objects consist of Namespaces (more on this shortly), Country, Locality, Organization, and the like. Standard Leaf Objects consist of User, Group, Alias, etc.

AD will also offer extensibility by allowing users to “publish” objects. For instance, a financial services vendor could “publish” a bank object specific to its business, thereby adding it to its AD.

Like Novell’s NDS, AD is based on the X.500 architecture. If you’ve worked with Exchange, you’ll recognize many common features since AD builds on Exchange’s directory.

A Design Note
Microsoft recommends as few domains as possible in the AD and is currently suggesting that each domain can hold a maximum of 1,000,000 objects. You can consider the use of OUs (Organization Units) as a replacement of existing resource domains. (Microsoft recommends not exceeding 10 levels, however.) Since the domain is the physical partition of the AD database, you could structure them either by business function (HR, manufacturing, or accounting) or by location. For enterprises that span many locations, you need to think of the WAN implications of having to replicate changes on every object in the domain to each DC. (Remember, a “change” could be as minor as your last network logon date and time.) Although Microsoft doesn’t recommend anything specific here, it’s worth noting that Novell tells NDS designers to structure by location in order to reduce directory traffic on the WAN.

Building on Exchange’s Database Services

AD is built on ESE97, the Extensible Storage Engine database format that Exchange 5.5 introduced. It promises three features that are key in any database: speed, reliability, and availability.

Microsoft’s internal development testing reports promising results in the area of speed, including lookups within seconds in a database of more than a million objects. Furthermore, the response rate for some common searches has been accelerated, thanks to the addition of a new, separate index file called the Global Catalog. This catalog is replicated to all DS servers in the enterprise. While the Global Catalog is created automatically with Microsoft-defined properties, it can be extended with administrator-defined property index fields so that it can also be used to locate custom directory objects.

The Global Catalog will prove very useful, but it can cause problems if you choose either too many index fields or inappropriate ones. If a query isn’t satisfied through the Global Catalog, the entire Active Directory partition must be searched to fulfill the request, which dampens performance. As a good example, let’s take a relatively fixed unique property like an employee number. Such a property usually makes a useful catalog entry if that’s how users will be likely to search the directory service.

Properties that aren’t fixed and unique, however, should be left out of the Global Catalog. Although the Global Catalog can include all objects in a forest, if a property is searched that’s not in the Global Catalog, only the user’s domain tree is searched.

Regardless, be prepared to consume significant disk space with a large DS and its associated Global Catalog.

The second important feature of ESE97—reliability—is supplied through basic database technology. That includes transaction logs, which Microsoft suggests you store to separate disks from the actual database. With directory services in place, every entry to the database will be considered a separate checkpointed transaction. In the event of a hardware or other type of failure, the entry can be reproduced from the log. Placing the logs on different disk drives helps performance and provides a backup of the database.

The third key database feature is availability. ESE97 achieves high availability through its distributed nature, in which replication is controlled by intervals or specific times. With the Exchange kinship fully implemented in NT 5.0, replication among administrative sites can be accomplished using standard RPC or through a messaging transport such as SMTP or X.400 pipelines. This lets administrators create backup replication paths that can be controlled by assigning a higher “cost” to the backup connection. This cost assignment allows certain paths to kick in only when needed.

Figure 1. The tree structure in Windows NT 5.0. Each domain in a domain tree has a copy of the directory service holding all objects for that domain, as well as metadata about the domain tree such as the schema, list of all domains in the tree, and location of global catalog servers.

The Physical Layout

Windows NT 4.0 and earlier versions use a single-master Security Accounts Manager (SAM) database, also referred to as a master-slave model. Regardless of what it’s called, a single-master database is one where only one copy, the master, can be written to. All updates are then sent to the backup database copies. The backup databases can be read to obtain information, but can be updated only by the master database. For example, if a user in New York changes his password, and the PDC, or the master database, is in Los Angeles, the update will happen in Los Angeles over the WAN connection. By contrast, the Active Directory database in NT 5.0 uses a multiple-master model. In that model, any copy of the Active Directory can be updated by applications, devices, and users.

Multiple-master database functionality raises several issues that the single-master model avoids, particularly involving database replication. Whereas NT 4.0 followed a primary domain controller (PDC) and backup domain controller (BDC) master-slave database model, all servers running Windows NT 5.0 Active Directory are simply considered domain controllers (DC).

With AD, all DC database replication within a site is automatic; it’s controlled by time intervals using Remote Procedure Calls (RPCs). Because of this automatic replication, you’ll want to make sure that your DS partitions follow your physical network. Essentially, sites follow IP subnetwork designs, the assumption being that each Active Directory site, as well as a subnet, will exist completely on a high-speed network. A site can span multiple subnetworks as long as you ensure that the interface connections are high-speed. However, keep in mind that a single subnet can support only one Active Directory site. Just as with Exchange, if you allow a site to span a slow link, you’ll experience problems with bandwidth consumption and interface congestion.

Consider the Namespace

Another aspect of Active Directory that you’ll need to consider before implementation is the namespace. A namespace is any administratively contained territory in which a logical name can be resolved. The primary use of a namespace is to organize the descriptions of resources in a manner that lets users locate those resources by their various characteristics or properties. The directory allows users to find the location of an object without knowing its name; if they know the name, it lets them find unknown but useful information about a known object. One overriding point in any directory is that the design of the namespace ultimately determines how useful the directory is to users as it grows. The best sorting and search algorithms in the world won’t help if you have a poor logical directory design.

Since every object on the network must be uniquely identified, what is the DS going to internally call each of your objects? In AD, this is accomplished by associating a Global Unique Identifier (GUID) with each object. This 128-bit number is guaranteed to be unique and is never changed by the AD, even if the object’s logical name is changed.

The GUID is generated when a user or an application first creates the Distinguished Name (DN) in the directory. The DN is based upon the namespace of the most successful namespace yet implemented, the Domain Name System (DNS). For anyone who doesn’t know, DNS is the namespace that uniquely identifies all the registered networks and their respective objects on the Internet.

Building on this already established namespace, Active Directory uses DNS as the location service for finding the physical location of network objects. Therefore, to implement Active Directory you must understand and have the proper hooks into a DNS. The Active Directory holds all of the properties necessary to identify an object. The DNS will then use the DN to return the IP address of the device where the object actually resides.

The integration between Active Directory and DNS is achieved by each Active Directory server publishing its own address in Service Resource Records (SRV RR) in a DNS (this is described in RFC 2052). An SRV RR is a record that maps a service name to an address where that service can be physically found. SRV is similar to the MX record that’s currently used to find mail servers. Although Microsoft is distributing a new DNS with NT 5.0, most DNS servers are on Unix platforms and will remain so in many enterprise information systems. This means that the Windows NT administrator will have to work with the Unix DNS administrator to make sure that the SRV RR entries are made, are accurate, and support RFC 2052.

More about Database Replication
Multiple-master database replication also affects when to synchronize changes, which information is most current, and when to stop replicating data to avoid loops. To determine what information needs to be updated, Active Directory uses 64-bit Update Sequence Numbers (USN). These are created and associated with all properties. Every time an object property is modified, its USN is incremented and stored with the property. In addition, each Active Directory server maintains a table of the latest USNs from all replication partners within the site. This table is composed of the highest USN for each property. When the replication interval is reached, each server requests only the changes with an USN greater than what’s listed in its own table.

Occasionally, changes may be made to two different Active Directory servers for the same property before all changes are replicated. This causes a replication collision. One of the changes must be declared more accurate and used as the source for all of the other replication partners. To reconcile this potential problem, Active Directory uses a sitewide Property Version Number (PVN) value. A PVN is incremented when an originating write takes place. An originating write is one that occurs directly at a particular Active Directory server. That’s different from a write that’s updated during replication. When two or more property values with the same PVN have been changed in different locations, the Active Directory server receiving the change will check the timestamps on each change and use the most recent one for the update. This means that you should maintain a central clock for your network (which Windows NT provides) and keep it accurate. Be aware that, while timestamps certainly can be used to make a decision, all automatic decisions are, by nature, still arbitrary.

Another replication issue is looping. Active Directory lets administrators configure multiple replication paths for redundancy purposes. To prevent changes from endlessly updating, Active Directory creates lists of USN pairs on each server. These lists, called up-to-date vectors (UDVs), maintain the highest USN of each originating write. Each UDV maintains a list all of the other servers within its own membership site. When replication occurs, the requesting server sends its own UDV to the sending server. The highest USN for each originating write is used to determine if the change still needs to be replicated. If the USN number is the same or higher, then no change is needed because the requesting server is already shown to be up to date.

Michael Chacon

Time for an IP Network

DNS is an IP-based service, which means you can’t use protocols such as IPX or NetBEUI with Active Directory. If you use those protocols, you’ll have to continue using the standard NetBIOS services (such as the Browser service) for publishing, locating, and establishing sessions within a Windows NT network—not a good idea. If you don’t have an IP network today, now’s a good time to design one in preparation for NT 5.0 and AD. Another important point to make here is that if you use the Dynamic Host Configuration Protocol (DHCP) for IP address allocation, you’ll want to implement Dynamic DNS as a replacement for WINS. Dynamic DNS will allow changes in IP address-to-host mappings to be updated in the DNS. Thus, you’ll avoid the need to make the changes manually. In most DNS systems, the network resources have static IP addresses. This means that every time a device obtains a new address, someone must open a text editor and update the DNS records manually. Depending on the size of your network, this manual updating can range from a mild pain in the derrière to completely impossible.

Most sites will use static IP addresses for network resources only, such as servers. Client machines won’t participate in the DNS. This is another reason to ban peer-to-peer networking from your network. If you allow users to share resources and publish them in the AD, the DNS must have entries for those clients; otherwise, it won’t be able to let other clients locate and establish sessions with them.

Dynamic DNS servers will use the protocol described in RFC 2136 to allow IP addresses to be updated on a periodic basis in the DNS. However, this RFC has yet to be widely implemented. If you want this functionality immediately, you’ll need to use the Microsoft DNS. A more likely scenario is a compromise in which you use Microsoft Dynamic DNS for local network resolution and have it report up to the static Unix-based enterprise DNS using accepted zone transfers. This area of network administration will, for a while anyway, be more dynamic than any new version of DNS.

Integrating DNS and AD

To illustrate DNS and Active Directory integration, let’s look at a typical user request. The client looks for an object, such as a file or other resource, in the AD. For example, the user might want to find a server containing sales reports. The request contains the address of the nearest Active Directory server, which was gleaned from the logon sequence. Active Directory first looks in the Global Catalog for the property selected. If the property wasn’t indexed in the Global Catalog, Active Directory searches through the actual directory partition to locate the object. One of the properties of the object would be the server’s fully qualified domain name, such as server1.sales.ny.abccomany.com. Following standard DNS resolution (see this month’s “Windows Insider”), the client then requests the server’s IP address. Once a session is established with the server, higher-level network operating system protocols such as SMBs are used to communicate with the server through a share name (another property associated with the DS object). As you can see, without an up-to-date DNS, users can’t connect to the objects listed in the AD.

Logical Considerations

Now that we’ve looked at some of the physical aspects of the AD, let’s examine some of the logical considerations. At the logical level, as with DNS, the Active Directory is simply another namespace. There are two main types of information stored in a directory. One is the logical location of a desired object—usually something such as an application, file, or printer. The other type is a list of attributes about the object. These attributes provide search characteristics such as phone number, address, eye color, or shoe size. The attributes can be used simply to inform or to provide search criteria when looking for general items of interest. Using attributes for searching becomes even more important when the Active Directory schema is extended. When you add objects, classes of objects, and attributes for those objects, their structure will determine how useful they’ll be to the directory users.

As in most useful directories, Active Directory objects are structured like trees or containers. A container is an object unto itself that exists only in the Active Directory. While it has attributes that can be used for description and location, it doesn’t represent an actual object on the network. Rather, it’s the holding place for objects and other containers. A series of containers holding objects connected in a hierarchy for organizational purposes is called a tree. The branches of the tree are containers that point to the endpoints, or leaves, which are the objects the users need. The branch containers can also hold objects.

Each container and object in the tree is uniquely named. The namespace then becomes the complete path of all the containers and objects, or branches and leaves, in the tree. Where you place an object in the tree determines the distinguished name that I mentioned earlier. The distinguished name identifies the complete path down through the tree hierarchy, such as:

/O=Internet/DC=com/DC=Ascolta/DC=Irvine/CN=Users/CN=Managers/ CN=Michael Chacon

Because the distinguished name is useful for organizing, but not so useful for remembering, Active Directory also borrows the idea of Relative Distinguished Names (RDN). An RDN is an attribute of the object that can still be uniquely identified within the tree, such as:

CN=Michael Chacon

In this example, the RDN could also be used as the logon ID for the network. You could also add other attributes, such as a logon alias, that are easier to remember.

As I mentioned earlier, the foundation for the namespace used for most networks will be based upon the current DNS namespace. This DNS relationship will determine the shape of the Active Directory tree and the relationship of the objects to each other. In my previous example, the domain controller entries are the domains listed in the DNS, while the Common Name (CN) entries are the specific paths for the user objects in the directory.

Physical IP subnets are created to manage local ARP broadcast traffic and otherwise control and maintain network efficiency. The DNS servers create administrative domains and organize the physical IP addresses into a hierarchy users can more easily understand. Windows NT domains are created to build a security boundary around network resources. This is accomplished by controlling user authentication and the assignment of permissions and privileges. Creating a contiguous DNS namespace that dovetails with a contiguous Active Directory namespace will take a great deal of planning and compromise on the part of the network administrator.

Multiple domains with a contiguous namespace under Active Directory are referred to as domain trees. For example, contiguous means that the top or root domain could be called company.com, while the child domains underneath it could be called something like ca.company.com and ny.company.com. Unlike the SAM-based NT 4.0 directory model, each domain will have automatically generated trust relationships. In addition, the trust relationships will be transitive rather than point-to-point, as they were previously. This means that, if domain “A” trusts domain “B” and domain “B” trusts domain “C,” then domain “A” and “C” will also trust each other for authentication. This is accomplished using the Kerberos authentication protocol, which is the default in Windows NT 5.0.

You can use an Active Directory without a contiguous namespace, but it’s not preferred. Usually, this need will arise either when two companies merge operations or simply because of poor planning. For example, a non-contiguous namespace could be two domains called company1.com and company2.com. When you’re dealing with a non-contiguous namespace and you want the various trees to share the same schema, configuration, and Global Catalog, you can create a “forest.” In a forest, Active Directory cross-references the root of each tree to build the hierarchies in the directory. The Kerberos trust relationships then uses the directory for authentication.

Because these design choices are made during installation, you’ll need to carefully pre-plan how your new Active Directory server will relate to the rest of the tree. During the installation process, you’ll be given several choices: create a new tree in a new forest (which you’ll do with the very first Active Directory machine), create a new tree in an existing forest, create a replica of a current domain, or create a child domain. One nice thing about these choices is that you can uninstall Active Directory without having to reinstall the entire NT 5.0 server. To make a member Windows NT server a DC, all you need to do is add the Active Directory service. Inversely, to make an Active Directory service a member server, all you need to do is remove it from a DC. You don’t have to reinstall the operating system to make this change, as was the case with domain controllers in Windows NT 4.0.

Figure 2. Trusts in Windows NT 4.0 are non-transitive; trusts go from point to point. Trusts in NT 5.0 are transitive; if domain “A” trusts domain “B,” which trusts domain “C,” then A will also trust C.

Changes with Groups

Another aspect of the logical planning process for Active Directory is the concept of groups. The functionality and terminology of groups changes dramatically from Windows NT 4.0. This is a good thing. Global groups are still with us, but now they can contain other global groups. Yes, we finally have nested groups! Global groups are still used to collect users, making it easier to drop them into other groups anywhere else in the forest. However, global groups can only contain users and other global groups from the same domain that the Global Group exists in. Domain-local groups can contain users and global groups from any domain in the Active Directory forest. However, they can only be applied to Access Control Lists, or ACLs, on objects within the same domain. As with NT 4.0, these domain-local groups will be used to tie permissions to local objects. New is the universal group, which can contain all other groups and users from any tree in the forest and can be used with any ACL within the forest.

We can have “Security Groups,” comparable to those in NT 4.0, or “Distribution Groups,” which will also be Exchange Distribution Lists when the next version of Exchange surfaces—but a group can only be one or the other, not both. The distribution list functionality is granted because you can have non-security objects in the Active Directory that are used only for identifying the recipients of electronic messages, even those outside the administrative authority of the tree.

The three types of groups—global, domain-local, and universal—can be combined to control access to network resources. The basic use of global groups will be for organizing users into administrative containers that represent their respective domains. Universal groups will be used to contain the global groups from the various domains to further manage the domain hierarchy when granting permissions. Microsoft advises adding global groups to universal groups, then assigning permissions to domain-local groups where the resource physically exists. This architecture lets administrators add and remove users from each domain’s global groups to control access to resources throughout the enterprise without having to make changes in multiple locations.

These group relationships can also minimize the global catalog replication traffic. Since domain local groups don’t appear in the global catalog, they are, by definition, local. Global groups appear in the catalog but not the users, and membership of the groups aren’t replicated outside of their defined domain. Universal group members are published in the catalog, but they should contain primarily global groups. Obviously, when planning your directory tree, you’ll need to consider the relationships among the various types of groups.

AD Leads NT 5.0

It’s clear that Active Directory will help move Windows NT 5.0 into a whole new ballgame. Despite the continuity of the low-level architecture and the security model from previous versions of NT, there’s a wide array of new services in NT 5.0 that will need to interoperate with the Active Directory. The best advice I can give is plan, learn DNS, plan some more, really learn DNS, and then plan again. AD, like the enterprise it serves, is almost organic, something that will always be changing—and so will the directory design over time. I’ll continue to cover NT 5.0 issues throughout the year in “NT Insider.”

Make sure you keep the distinctions of your logical design and physical design appropriately clear. Examine the physical layout of your network in terms of switches, hubs, and routers. Design a test installation with the current beta versions of NT 5.0 (as they are released) that will produce replication problems. This is a new product. There will be problems. The key is to understand them before you deploy instead of discovering them on your production network.

Your goal is to roll out the real thing without uncovering many surprises. You can, after all, reproduce a complex network with one router. If you plan and test with the fundamental principle in mind that a directory service is a database, you’ll go a long way toward building a solid Active Directory implementation.

Thanks to Greg Neilson for providing insights about Active Directory to the editorial crew.

Featured