In-Depth
        
        White-Coat Computer Science
        Those who test products and changes before rolling them into production stand a higher chance of continued employment. Use that technical version of Darwin’s natural selection to your advantage.
        
        
			- By Thomas Eck
- February 01, 2000
Since the release of Windows NT Service Pack 4, the engineering 
        team had been hard at work making sure the new product 
        would be compatible with all the current hardware standards 
        and mission-critical applications in use throughout the 
        enterprise. After months of compatibility testing in the 
        lab, the project was finally passed over to the software 
        distribution team for global deployment. Microsoft Systems 
        Management Server pushed the job out to the workstations 
        with little problem; it was now time to focus on the servers. 
        The site in Bangor, Maine was the first to deploy—and 
        thus the first to witness the “blue screen” boot failures 
        on some of the older Compaq servers. 
      
         
          | 
               
                | 
                     
                      | Another 
                        Methodical Approach |   
                      | For another perspective 
                        on a methodical approach to your work, 
                        read Lee Christopher Grant’s exclusively 
                        online article “Survive 
                        Chaos.” |  |    | 
      
      It never fails that efforts in the controlled safety 
        of the lab often don’t yield the same results when we 
        apply them to a production environment. Despite our best 
        efforts to look at the task at hand from every angle, 
        we tend to run into problems that cause us to be up all 
        night racking our brains about where we diverged from 
        the beaten path. When the problem is finally solved, we 
        often find that the issue was caused by some incompatibility 
        that was either well known (except to us), or we realize 
        that the lab configuration didn’t accurately reflect our 
        production environment.
      Many of the horror stories we hear regarding production 
        environment failures during deployments come not from 
        lack of knowledge or skill, but because of some divergence 
        from what was expected. Microsoft’s claim that Service 
        Pack 4 (SP4) was a simple upgrade shouldn’t have freed 
        you from having to test the product in your environment. 
        You may have applied SP4 successfully to your desktop, 
        but when you applied it to the file server hosting all 
        of the executive’s home directories, you witnessed the 
        blue screen of death. After standing in the data center 
        scratching what remains of your hair for the balance of 
        the night and trying everything under the sun short of 
        voodoo, you receive the dreaded call. It’s the director 
        of IT, asking, “Why can’t I access my home directory?” 
        You don’t really want to tell him that you never tested 
        it on this hardware platform, do you?
      For those who don white lab coats for a living, existence 
        is dependent not upon work done in the lab, but on the 
        ability to repeat experiments successfully on demand. 
        Successful scientists maintain pristine laboratories and 
        document every step of every process they perform to assure 
        that their results will be repeatable if success is attained. 
        If a scientist believes she found the cure for cancer, 
        wouldn’t it be a shame if the results were unrepeatable? 
        Did she really find the cure if she can’t repeat the findings 
        of the experiment?
      When we explain to our peers, customers, and bosses that 
        a procedure worked in the lab but doesn’t work as planned 
        in the production environment, our credibility is put 
        at stake. As technologists, we’re typically a financial 
        liability to an organization, unless we work for a contracting 
        firm whose business is to sell our services. We rarely 
        make any money for the organization, but instead we must 
        justify our existence within the enterprise for the value 
        our work adds to existing business processes. We build 
        solutions that enable business users to do their work 
        more efficiently, allowing them to spend more time on 
        the profit-generating business processes rather than on 
        the tools needed for the job.
      Avoiding TechnoDarwinism
      If you prefer to fly by the seat of your pants rather 
        than apply some basic scientific principles to your work, 
        Darwin’s theory of natural selection will work against 
        you within your organization. Quite simply, those who 
        test products and changes before rolling them into production 
        stand a higher chance of continued employment. On the 
        converse side, those who choose to take their chances 
        by failing to test a product before deploying it in a 
        production environment quickly fall victim to Darwin’s 
        theory of natural selection. These are the individuals 
        often “selected” to leave the organization after failing 
        to grasp the importance of applying scientific principles 
        to their work.
      In any well-devised deployment plan, there should always 
        be time reserved for research and testing. But when things 
        run late, lab time is usually the first item to get cut. 
        Most project managers seem to think that the week of testing 
        you entered on your deployment project plan is merely 
        a code word to describe the extra time added to every 
        project plan to accommodate our inability to accurately 
        predict the unknown. Immediately he targets this seemingly 
        bogus entry for deletion or reduction from the project 
        plan.
      Inevitably, once you move your project from the development 
        domain into production, a host of unforeseen circumstances 
        keeps you from seeing daylight for the next few days. 
        This prevents the project from completing anywhere near 
        the milestone set by the project manager, raising questions 
        as to whether or not it was truly worth it to cut out 
        that week of pre-production testing.
      All too often, the work we do is so new or unique that 
        we can’t accurately estimate the time we’ll need or the 
        obstacles we’ll encounter along the way. Did the NASA 
        scientists accurately estimate the time or money required 
        to put the first man on the moon? The moon landing proved 
        to be an event that NASA would repeat, and inevitably, 
        the knowledge gained from the first mission would benefit 
        the time and resource estimates for subsequent missions. 
        Armed with a bit of knowledge learned from our own lab 
        experiments, we too can begin to benefit from our previous 
        experiences.
      For systems administrators, there’s often little reason 
        why we can’t practice in a non-critical environment to 
        prepare ourselves for the pitfalls that may lie ahead 
        in the upgrade. Not to say that every upgrade, migration, 
        and deployment will go smoothly if we practice it once 
        or twice in the lab—there will always be unforeseeable 
        problems. But generally speaking, significant amounts 
        of practice beforehand will yield a better success ratio 
        for our efforts than if we just give it a try and see 
        what transpires.
      The time to research incompatibility issues, test changes 
        to the environment, and devise disaster plans isn’t after 
        the event occurs, but long before. If you work in an environment 
        where you feel you should be donning a fire helmet most 
        days, you’re already familiar with the dangers of avoiding 
        a proactive approach to problem solving. Those who are 
        constantly in a reactive state have no time to prepare 
        technologies that will increase competitive advantages 
        for the enterprise. Considering the increasing role of 
        technology in today’s super-competitive market, even entire 
        organizations can easily fall victim to the selective 
        nature of TechnoDarwinism.
      A Few Guidelines
      To help ensure that efforts in the lab are indeed useful, 
        consider the following guidelines.
      Standardize the User Environment
      Too many enterprises lack strict standards for the user 
        environment. Instead, they let machines exist with varying 
        directory structures, office automation suites, hardware 
        platforms, and even operating systems. Because we’re generally 
        financial liabilities to most organizations, we must find 
        ways to reduce the cost of supporting machines in the 
        environment to justify our continued existence. If each 
        machine is different, there’s no way to benefit from the 
        economies of scale that we’d enjoy in large enterprise 
        environments. While a discussion on the importance of 
        enterprise standards is well outside the scope of this 
        article, organizations that lack a strict policy on hardware 
        and software standards are destined to drive IT support 
        costs significantly higher than truly necessary. Without 
        a normalized environment, we have no way to predict successfully 
        our ability to re-create the results derived in the lab 
        in a production environment.
      Research Known Incompatibilities Before 
        Trying to Change Production Environments
      The inability for certain Compaq servers to boot Windows 
        NT successfully after installation of Service Pack 4 is 
        well documented on Compaq’s Web site, but we most likely 
        didn’t find that out until after the blue screen appeared. 
        All too often, bonus-protecting managers insist that a 
        deployment be done by some arbitrary date, leaving us 
        with little time to perform the required testing or research. 
        A simple visit to Compaq’s Web site could have saved us 
        hours of downtime (thus killing the manager’s bonus) and 
        kept us from having to answer the dreaded queries from 
        senior management of how this could have happened.
      By visiting the Compaq site before the upgrade, we would 
        have learned that there’s a known incompatibility between 
        firmware v.1.36 and below on SMART/2P and SMART/2E array 
        controllers and Microsoft Windows NT Service Pack 4. Armed 
        with such knowledge, we could have applied SSD 2.08 (as 
        per the guidance of the Customer Advisory) while we had 
        the scheduled downtime. Had we taken a single proactive 
        step to gather more information regarding the task at 
        hand, the SP4 installation on the server might have succeeded.
      Document All Procedures Performed 
        in the Lab Environment
      The most important way to increase the repeatability 
        of your work in the lab is to make sure you document every 
        step of the process, no matter how trivial it may seem. 
        Our notes must be so detailed that a third party can easily 
        re-create our work without our involvement.
      It’s also essential that you have a peer (or a QA group, 
        if your organization has one) review your documentation. 
        As authors, we have a tendency to make assumptions that 
        we may not clearly document in the text.
      Create Identical Lab and Production 
        Environments
      If we hope to gain any useful data from our lab experiments, 
        the lab must closely resemble the production environment 
        for the task at hand. For example, if we want to simulate 
        the interaction of an application across domain trusts, 
        we must first establish a similar environment to what 
        we have in production. While it’d be ideal to match every 
        aspect of the production environment in the lab, this 
        is often cost-prohibitive. Instead, we may be able to 
        simulate the 10 servers making up the domain architecture 
        using decommissioned desktops and servers to simulate 
        the interaction of our product in a multi-domain environment. 
        The same is true for testing driver updates, hot fixes, 
        and other system-level software changes to hardware. This 
        includes making sure that the firmware revisions, drivers, 
        card locations, memory, processor count, etc. in the lab 
        equipment match what’s being used in production.
      Each application installed on a machine wants to install 
        its own DLLs in the system directory, and perhaps the 
        latest version of MDAC installed with Office 2000 may 
        just break the critical database application the primary 
        user runs each day. Without significant testing in a lab 
        that mirrors your standardized production environment, 
        you can’t provide any assurances (beyond mere guesswork) 
        to those who count on you that your efforts will be truly 
        successful.
      Use Scripting Methods to Improve Repeatability 
        of Results
      One of the best ways to make sure you can repeat complex 
        operations is to write a script to perform the upgrade. 
        Once the script runs the way you want it to, it can be 
        easily run in the production environment to duplicate 
        your efforts exactly. This is especially useful when trying 
        to apply complex NTFS permissions, create users or groups, 
        or modify the Registry. Scripts also help assure that 
        the environment has been initialized to a known state 
        for each test we perform, which is essential for garnering 
        valid data from our experiments.
      Using the Active Directory Service Interfaces (ADSI) 
        with our favorite programming language, we can perform 
        almost any Windows NT, Windows 2000, Exchange, IIS, or 
        Novell administrative function programmatically. This 
        can be useful not only for developing scripts that will 
        re-create our actions in the lab in a production environment, 
        but we can also use Visual Basic and ADSI to create powerful 
        scripts that can re-create the production user domain 
        SAM in our lab environment.
      
         
          | 
               
                | 
                     
                      | If 
                        you find the concepts in this article 
                        interesting, you might enjoy the following 
                        links: |   
                      |  |  |    | 
      
      Avoiding Extinction
      To help increase your chances of success for implementing 
        new changes in your production environment, here are some 
        steps to follow:
      
        -  If you’re operating in a non-standardized environment, 
          seize the opportunity to implement standards when performing 
          a major upgrade to the enterprise (such as Windows 2000).
-  Research potential known incompatibilities for the 
          software or hardware you’re about to install.
-  Re-create the elements of the production environment 
          that will be affected by your changes in a non-critical 
          environment or isolated network.
-  Document your experiences and lab procedures with 
          meticulous detail.
-  Script procedures in the lab environment where possible 
          to guarantee the same procedure will be followed when 
          it’s moved to production. Whether it’s being used to 
          initialize the environment during the testing or to 
          perform the actual task at hand, scripting can help 
          assure consistent results.
-  Test the impact of a new application or system update 
          with all critical applications. Simply logging into 
          the client isn’t an adequate test for most deployments.
-  Have a third party validate your documentation to 
          make sure it can be reproduced without your intervention.
The next time you avert a major system outage because 
        you found the problem and resolution before the change 
        was implemented in a production environment, raise a glass 
        to the parents of scientific thought for their contribution 
        to your success.