GPO Mouse Trap

Propagating the changes made to a Group Policy Object out to all the affected domain clients involves a complex array of interconnecting pieces.

I'm sure you've edited Group Policy settings in Active Directory. It's a simple maneuver: Open the Group Policy Editor, drill down to a setting, click-click-click, and the job is done. What ensues, though, is anything but simple. Propagating the changes made to a Group Policy Object out to all the affected domain clients involves a complex array of interconnecting pieces that reminds me of the children's game Mousetrap. In case you don't remember, the game involves building a contraption where a single, tiny event spurs a flurry of activity that eventually sends a plastic cage rattling down over a none-too-surprised plastic mouse.

Group Policy settings are stored in a set of text files in the Sysvol folder. A service called the File Replication Service (FRS) is responsible for replicating those files between domain controllers (DCs), and it's this FRS portion of the GPO contraption I want to discuss this month. This is because I meet quite a few administrators in my travels, and just about all of them have least one war story about an FRS failure. This makes it worth your time to know as many details of FRS operation as possible to prepare for potential problems.

In the March issue, Gary Olsen did a great job of covering the tools available to troubleshoot FRS problems. I want to spend some time looking at the underpinnings of the FRS itself.

Replica Sets
As Microsoft will tell you repeatedly in many Knowledge Base (KB) articles, FRS is a multi-threaded, multi-master replication engine that replaces the LMREPL (LanMan Replication) service in the 3.x/4.0 versions of Windows NT.

To do its job, FRS needs to know which files to replicate and where to replicate them. Files under the jurisdiction of FRS form a Replica Set. The Sysvol files (\Sysvol\Domain) belong to an FRS replica set called the Domain Replica Set. The Distributed File System (DFS) also makes use of FRS, and keeps separate replica sets.

When replicating the Domain Replica Set (Sysvol), FRS follows the same topology used by AD. Unlike AD replication, though, FRS notifies its intra-site replication partners immediately of a pending update. Between sites, FRS uses the replication intervals defined by Site Link objects. The default replication interval is three hours, but most organizations lower this to its minimum setting of 15 minutes.

Change Journal
Here's a quick test to demonstrate FRS operation. Create a small text file in the Sysvol folder under \Sysvol\Domain. Call the file "Test1000.txt."You'll notice that the file immediately appears in the \Sysvol\Domain folder at the other DCs in the same site. The changes won't appear at DCs in other sites until the bridgehead servers in those sites poll for updates.

But how does FRS know when a new file is created in Sysvol? The answer lies with a little-known NTFS feature called the Change Journal. Whenever a file is opened or closed in a volume, NTFS puts an entry in the Change Journal for that volume. (The Change Journal forms a part of the Master File Table metadata records and isn't visible from the user interface.)

FRS watches for entries in the Change Journal. When it sees an entry from a file in an FRS replica set, it makes an entry in the FRS database, located in %Windir%\Ntfrs\Ntfrs. jdb. It also places a compressed copy of the new or modified file into a special folder called the Staging Area, located in \Sysvol\Staging Area\ Domain. Using a staging area minimizes the possibility that a file lock will cause a replication error. You can view the content of the staging area from the shell, but the files are flagged as Hidden.

Microsoft provides a utility called Ntfrsutl in the Support Tools that you can use to view the content of the FRS database. When a new file is created, FRS adds an entry to a table called Idtable. To view the Idtable entry for the "Test1000.txt"file just created, dump the Idtable to a file using "Ntfrsutl idtable > idtable.txt", then view the last 25 lines of the file using the Tail utility from the Resource Kit as follows:

C:\>tail -25 idtable.txt
Table Type: ID Table for DOMAIN SYSTEM VOLUME (SYSVOL SHARE) (1)
FileGuid : 9f26b755-26c4-44dc-8a5e9b81f2af4a4a
FileID : 00070000 00002b0a
ParentGuid : f424f6d7-a2c8-4690-8eb8d59d94a82ad1
ParentFileID : 00050000 00002f27
VersionNumber : 00000000
EventTime : Thu May 13, 2004 09:21:35
OriginatorGuid : 17d28743-8685-4f2c-8cb977f026eca4b9
OriginatorVSN : 01c438f8 59443f8f

CurrentFileUsn : 00000000 01541fa8
FileCreateTime : Thu May 13, 2004 09:21:34
FileWriteTime : Thu May 13, 2004 09:21:35
FileSize : 00000000 00000010
FileObjID : 9f26b755-26c4-44dc-8a5e9b81f2af4a4a

FileName : test1000.txt
FileIsDir : 00000000
FileAttributes : 00000020 Flags [ARCHIVE ]
Flags : 00000000 Flags []

ReplEnabled : 00000001
TombStoneGC : Thu May 13, 2004 09:21:38
OutLogSeqNum : 00000000 00000000
Spare1Ull : 00000000 00000000

MD5CheckSum : MD5: a11ab99d 29a3a4c4 ba038d25 0e563fc6
RetryCount : 0
FirstTryTime :

The CurrentFileUSN value in the FRS database matches the Update Sequence Number (USN) assigned to the entry in the Change Journal. This creates a link between the two databases. Later, we'll see how FRS uses this link to keep itself up-to-date.

The MD5CheckSum entry is a hash of the file content. A hash algorithm changes the hashing output dramatically when even one byte of the input changes, so FRS can use the MD5 hash to compare file versions and avoid unnecessary replication.

The FRS database has a variety of tables, and sifting through them takes some time. To simplify this work, I recommend downloading a copy of a Microsoft tool called FRSDiag from http://snipurl.com/31ij. This little gem (shown in Figure 1) combines all the Ntfrsutl options with a dump of the Event Logs and FRS debug logs and analyzes them for failures or invalid content. You can aim the tool at any DC.

Figure 1. FRSDiag main window showing options and sample output.
Figure 1. FRSDiag main window showing options and sample output. (Click image to view larger version.)

Microsoft doesn't provide a tool for viewing the Change Journal, but a few years back, Win32 wizard Jeffrey Richter created a utility that does the job nicely. Point your browser at the September 1999 issue of MSDN Magazine, and click a link to download ChangeJournal.exe. This self-extracting executable contains the source code for a utility called CJDump. Use Visual Studio to compile an executable (or convince one of your colleagues on the development side to do so). Launch CJDump from the same volume that holds Sysvol. Here's an example entry for the test file I just created:

Usn(0x0000000001541fa8) Reason(0x80008103) test1000.txt

Notice that the USN number matches the CurrentFileUSN entry in the FRS database. Using the Change Journal, FRS can verify at any time that it has the most current copies of any file in any given replica set.

Change Orders
When you make a change to a file in an FRS Replica Set, FRS makes a database entry in a table called Outlog. This entry is called a change order. In addition to the database entry, FRS compresses a copy of the file and places the copy in the staging area. This both minimizes disk space and reduces the bandwidth necessary to transmit the change to any replication partners.

Once a change order has been prepared locally, FRS notifies its replication partners. The replication partners pull copies of the database entry, along with copies of the compressed file using a Remote Procedure Call (RPC)-based file transfer mechanism. It doesn't use the standard Server Message Block (SMB) file copy mechanism.

File deletions simply copy the database entry. The file is then removed from the local copy of Sysvol. FRS replication traffic is encrypted.

Like most RPC transactions, the port number used by FRS is registered dynamically. The number can and does vary from server to server. FRS might listen at port 1051 on one DC and at port 2883 on another. You can find the port used by FRS on a particular server using the Rpcdump utility in the Resource Kit. Look for the port used by the NTFRS Service. Here's an example:

192.168.0.161[2653] [f5cc59b4-4264-101a-8c59-08002b2f8426] NtFrs Service :NOT_PINGED

If you want to lock down FRS to a particular port for use through a firewall, see KB 319553.

Unlike AD replication, FRS can't use Simple Mail Transfer Protocol (SMTP) as a file transfer protocol. Because Sysvol contains all GPO files for a domain, the inability of FRS to replicate using SMTP means that a site that uses SMTP for AD replication must be in an entirely separate domain.

Change Order Receipt
When FRS obtains a change order from a replication partner, it updates its local copy of the FRS database and places the compressed file in the local staging area. It then decompresses the file, renames it back to its original name, and places the file in its correct location within Sysvol. Using the staging area to queue inbound change orders reduces the chance of replication failure caused by a lock on the target file in Sysvol.

(Although I've mentioned file locking several times because it's an important consideration in managing FRS, lock contention is uncommon when working with Sysvol. Updates applied by the Group Policy Editor happen quickly and all changes take place at the PDC Emulator, then propagate outward from there. File locks and collisions are a big issue when using FRS to support multiple DFS targets. For more information, see KB article 816493.)

Once a change order's been sent to all direct replication partners, and those partners have committed the change to their copy of Sysvol, FRS clears the compressed file from the staging area. The file in the staging area does not go away if FRS cannot replicate to one or more of its partners. This permits updates to queue up so that a replication partner can quickly refresh its copy of Sysvol when it comes back online. This behavior, called Change Order Retention, is documented in KB 322141. FRS periodically purges old change orders from the database and staging area. The default tombstone interval is seven days.

If a file gets modified several times within the same replication interval, FRS creates separate change orders for each modification. The replication partners receive these change orders and write the files, one after another, into their copy of Sysvol.

FRS only deviates from this behavior when it receives an incoming change order from one partner that conflicts with a change order for the same file from another replication partner. In this instance, FRS changes (morphs) the name of one of the change orders to include the GUID of the originating server, and puts a warning in the Event Log.

File morphing allows FRS to retain the information in the conflicting file without actually using it, avoiding possible corruption of the replica set. Again, this is unusual for Sysvol because all changes should be done at the PDC Emulator.

Staging Area Overfill
If you pull a DC from service for an extended period, the file replication updates queue up in the staging areas of its replication partners. The staging area has a default maximum size of 660MB, a value controlled by this Registry entry:

HKLM | SYSTEM | CurrentControlSet | Services | Ntfrs | Parameters | Staging Space Limit in KB.

660MB seems like quite a bit of storage for Sysvol changes, especially when you consider that staged files are compressed, but the staging area will eventually fill up if a DC fails and you don't get it back online in a reasonable period of time. You can also fill the staging area if you use an antivirus or defrag utility that hasn't been configured correctly for use on a DC or applications that make aggressive changes to Group Policies (documented in KB 315045.) KB 815263 lists FRS-friendly utilities, but the article is fairly old.

If a DC crashes and you don't return it to service, it's important to run metadata cleanup and remove the FRS entries from AD, as documented in KB 216498.

If the staging area does get full, the impact depends on the service pack you have installed. In Windows 2000 SP2 and earlier, FRS behaved like Fred Flintstone at quitting time. It gave a yabba-dabba-do and stopped working. You had to increase the size of the staging area as described in KB 264822.

Win2K SP3 and higher and Windows Server 2003 simply remove the oldest entries from the staging area until the utilization goes from 90 percent full to 60 percent. This keeps the staging area from hitting the limit and stopping FRS, but it does force FRS to regenerate any deleted entries by referring to the Change Log when its replication partner comes back online. This causes performance degradation. KB 329491 has recommendations for staging area sizes to reduce the potential for filling the staging area.

Windows Server 2003 Hotfix
When FRS starts, it compares the latest USN in the Change Journal with the last USN recorded in its own database. It then creates a change order for each new or modified file and begins notifying its replication partners. The Change Journal, then, makes FRS resilient against a temporary loss of service.

There's a limit to this resiliency, though. The Change Journal has a default size of 128MB. Once it exhausts this storage space, it begins to overwrite the oldest settings. This is called journal wrap and it happens as a matter of design.

But if the FRS service is turned off for a long period of time, and a journal wrap occurs, then FRS has no means of determining whether the files and folders in its replica set are up-to-date. This requires a non-authoritative restoration of Sysvol as documented in KB 292438.

To help avoid the necessity of restoring Sysvol, Microsoft increased the size of the Change Journal from 128MB to 512MB in Win2K SP4 and in pre-SP1 hotfix for Windows 2003 (KB 823230). KB 819268 discusses what to do if you don't have sufficient hard drive space to accommodate this growth. This should not be an issue with Sysvol, only with very large DFS-based replica sets.

Sysvol Bloat
When you use the Group Policy Editor to view or modify the contents of a GPO, the editor copies a set of ADM files from the %Windir%\Inf folder on the machine where the editor was launched to an ADM folder under the GPO policy folder in Sysvol.

In Windows 2003, the cached ADM template files add up to about 1.8MB. The same template files are stored in the policy folder for each and every GPO. So if you have 100 GPOs (not an outrageous number), you'll have a Sysvol of at least 180MB, not including the size of the policy text files themselves, which are relatively miniscule at about 1KB apiece. Think of this as "Sysvol bloat."

To limit the network traffic caused by replicating hundreds of megabytes of ADM template files in Sysvol each time you promote a DC, you can remove the ADM template files from the GPOs. Do this by manually unloading the template files using the Group Policy Editor, or simply deleting the ADM files from each policy folder in Sysvol. You will need to reload the templates if you want to make changes to the Administrative Template settings, then unload the templates once you've finished doing the modifications.

In Windows 2003, you can implement two Group Policy settings, documented in KB 816662, to eliminate Sysvol bloat. The first policy setting, Turn Off Automatic Updates of ADM Files, is available in Win2K as well. It stops the GP Editor from overwriting any existing ADM template files, but doesn't stop the editor from copying ADM files to Sysvol if the files are absent. The second policy setting, Always Use Local ADM Files For Group Policy Object Editor, is only for Windows 2003. It prevents populating the ADM cache in Sysvol entirely. Instead, the GP Editor loads the template settings into memory using the ADM files on the Windows 2003 server where you run the editor.

File Replication System Monitoring
If FRS stops working, you want to know about it before your users start experiencing problems with Group Policies and before the staging area fills up to the limit. As Gary covered in the March issue, Microsoft has a variety of tools for tracking down FRS problems and monitoring FRS operation. They're available here.

Also, check out the Troubleshooting FRS chapter from the AD Operations Guide.

Featured

  • Microsoft Extends AI Copyright Protections to Its Partners

    Microsoft this week announced several new partner benefits meant to accelerate channel sales amid skyrocketing AI demand.

  • Image of a futuristic maze

    The 2024 Microsoft Product Roadmap

    Everything Microsoft partners and IT pros need to know about major Microsoft product milestones this year.

  • Close Up Dollar Bill Graphic

    Price Increases Coming to Power BI, Microsoft Teams Phone

    Microsoft is preparing to implement the first price increases for two standalone products: Power BI and Microsoft Teams Phone.

  • Dynamics 365 Getting Data Security Boost from Druva

    Druva is working to extend its SaaS-based data security platform to support Microsoft's Dynamics 365 Sales and Dynamics 365 Customer Service products.