In-Depth
        
        From the Trenches: The Confusing Case of the Two-PDC Domain
        Getting a new ERP application up and running was only the first challenge for this MCP.
        
        
			- By Chris Gerrib
- September 01, 2000
I work as a networking consultant, providing services 
        to a number of small and mid-sized clients in the Chicago 
        area. In my work I’ve seen various disasters, “weird” 
        quandaries, and just plain old problems. But the worst 
        thing I saw was when my single NT domain decided to have 
        two Primary Domain Controllers (PDCs) at once! 
      I was on-site at a mid-sized steel manufacturer, helping 
        it implement J.D. Edwards’ OneWorld enterprise resource 
        planning (ERP) product. This Windows-based application 
        requires IBM AS/400, an NT server running a SQL database, 
        and fixed TCP/IP addressing. The client software runs 
        on Windows 95 or Windows NT 4.0 Workstation, but any development 
        work (and with any ERP project, there are tons of development 
        work) has to be done on NT Workstation. 
      The Scene of the Crime
      The client had completed the project’s first phase, and 
        so a number of users were “live” on OneWorld, running 
        on the PDC. The developers now needed a test machine for 
        the next phase. I built a standard NT server on a Dell 
        server-class chassis, and added it to the domain as a 
        Backup Domain Controller (BDC) during the install process. 
        So, I now had a fairly simple network, consisting of a 
        PDC (let’s call it “OLD”) a BDC (“NEW”), an AS/400, all 
        in a single NT domain. All devices had the then-latest 
        service packs installed. 
      About two weeks later OLD crashed due to a hardware failure. 
        It was a fairly quick fix; but to allow users some network 
        services, I promoted NEW to PDC while I fixed OLD. Within 
        a few hours, I got OLD back up. It came up as a PDC in 
        Server Manager. I demoted NEW to BDC in Server Manager 
        without errors—or so I thought. After a brief refresh 
        both Server Manager displays agreed; OLD was the PDC, 
        NEW was the BDC. I had taken all client PCs down during 
        this swap and now asked them to log back in. 
      The majority of people, including all production users 
        (with Windows 95 PCs), were able to proceed with their 
        work. However, none of the developers with NT workstations 
        could get past the domain login screen. Thanks to another 
        consultant’s oversight, we didn’t have the local user 
        account name or password. The developers’ PCs were now 
        useless. 
      Red Herrings 
      While attempting to troubleshoot this problem, I realized 
        that the domain was having more serious trouble. Changes 
        made to user account information on the PDC weren’t being 
        communicated to the BDC, despite repeated use of the “Force 
        Synchronization” command on both machines’ Server Manager. 
        Also, NT workstations were unable to join the domain even 
        on a fresh install. 
      I checked name resolution, and could PING by name. I 
        installed NetBIOS on both servers, rebooted after hours, 
        and again found that the two PCs weren’t talking. I tried 
        to promote NEW to PDC, and was able to do so without errors. 
        (Of course, that should have failed.) Frustrated and with 
        an unhappy client, I called Microsoft Technical Support. 
      
      After retracing my steps with IP name resolution, tech 
        support had me try a command line utility called Nltest 
        (available in the Windows NT 4.0 Resource Kit). This has 
        several options, including “force synchronization” and 
        “query” options. (For more information, see TechNet article 
        Q158148, “Domain Secure Channel Utility: Nltest.exe” on 
        Microsoft’s Web site). The end result was failure. The 
        two servers, OLD and NEW, weren’t talking. 
      The diagnosis was that the “secure channel” between the 
        two PCs had failed. NT servers use this “secure channel” 
        to pass RPC calls between controllers in a domain or between 
        domains in trust relationships. Specifically, the failure 
        was on OLD—the production server! This was why I couldn’t 
        get my NT workstations to connect, even if I did a clean 
        install. The only reason my Windows 95 production machines 
        were working was because that OS doesn’t integrate into 
        the domain as tightly as NT 4.0 Workstation. 
      The Solution Revealed 
      At this point I had one choice: Format OLD’s hard drive, 
        reinstall everything, and restore from backup. This was 
        on Tuesday night. Not wanting to lose the weekend as well 
        as three days of the developers’ work, I pressed for another 
        option. Tech support offered a potential way out: Rename 
        the PDC! I’d been taught that doing this would be equivalent 
        to putting a gun to my head, but I had nothing to lose. 
      
      After hours that night I stopped all the services (SQL, 
        backup, and the like), ran a backup, and set all but the 
        minimum services to “manual.” Then I renamed OLD (to GIHTW 
        for “God, I Hope This Works”) and rebooted. Much to my 
        surprise, GIHTW came up clean and declared itself PDC 
        of the domain. More important, changes made on OLD/GIHTW’s 
        User Manager immediately appeared on NEW. Time for step 
        two: Change GIHTW back to OLD and reboot. Again, everything 
        worked fine. The two machines, OLD and NEW, were talking 
        to each other and propagating changes. Plus, I was able 
        to restart all the services—including the critical SQL 
        databases—without incident. Even better, the developers 
        could log into their NT workstations without a hitch. 
      
      Epilogue 
      The results left me happy (and my weekend plans intact). 
        And since I’d prepared the client for the worst (re-install 
        and recover), I looked like a hero to him. Also, I learned 
        two valuable lessons from this situation. 
      First, you should be very careful about promoting and 
        demoting domain controllers. Although it should work fine, 
        it may not. Likewise, you need to verify that your domain 
        is working—looking at Server Manager isn’t enough. 
      Second, don’t give up. In my odyssey several people suggested 
        I “just format and re-install.” By being persistent, I 
        got the client up without risk of data loss or excessive 
        overtime charges.
        
        
        
        
        
        
        
        
        
        
        
        
            
        
        
                
                    About the Author
                    
                
                    
                    Chris Gerrib, MCP, CNE, has been in high-tech for five years, the last four with Hinsdale, Illinois-based consulting firm Information Technologies International. He started out as a “screwdriver holder” for the senior technicians and worked his way up to his current position as VP of Operations. He holds degrees from Southern Illinois University and the University of Illinois.