In-Depth
        
        Be Prepared
        Scout out potential trouble by listening to your servers’ sounds. The best plan, as always, is a current backup.
        
        
			- By Rodney Landrum
- March 01, 2002
Many network administrators who have been in the game long enough tend 
        to develop a motherly instinct when it comes overseeing their family of 
        servers and workstations. They know when their machines are too hot, hungry 
        (for more RAM), or sick. The sound a machine makes when it’s down and 
        out is distinct. Sounds emanating from the server normally lead to a straightforward 
        diagnosis, as there are so few moving parts: hard drive failure. The noise 
        sometimes reminds me of a playing card in the spokes of a bicycle or an 
        index finger in a moving CPU fan (though I shouldn’t admit to having done 
        the latter). The symptoms aren’t always severe at first. Advanced operating 
        systems like Windows NT/2000 can detect bad sectors on the drive and reroute 
        the location accordingly. But when the worst-case scenario becomes reality, 
        you’d better count on a long night. 
      
      
      
Overconfidence
        I wasn’t prepared for such an evening when I drove six hours to 
        a client site to upgrade its SQL Server from 6.5 to 7.0. In fact, because 
        I’d done at least 10 upgrades of the same kind, using the upgrade wizard 
        provided in SQL Server 7.0, I was expecting to roll out early and spend 
        a relaxing evening studying in the hotel room for an upcoming exam.
      After arriving, I made my introductions to the IT staff members, with 
        whom I’d only had phone contact previously. We shared a few moments of 
        “Hey, that’s what you look like!” before I started getting ready for the 
        upgrade. I think they could perceive my confidence, as we’d been planning 
        this for several weeks. We chose a Thursday evening so I could be on site 
        Friday morning in the event of a disaster.
      The Dreaded Clicking Sound
        I remember seeing the entrance to the server room from about 15 feet away. 
        As I approached the entrance, I felt something odd, like a premonition 
        of a long night without the opportunity even to stop for a slice of delivery 
        pizza—but I shrugged it off.
      The moment I placed my right foot in the doorway, however, I heard it: 
        The repetitive click of a small, incapacitated spindle arm banging against 
        the metal surface of a disk drive platter. I looked at the now inquisitive 
        IT staff members, who’d placed their entire trust in me to make their 
        jobs and lives easier. Their questioning looks said, “I wonder what he’s 
        going to do about this?” I smiled and said something that apparently only 
        I found humorous: “So, you guys did make good backups like I asked, right?”
      
         
          | 
               
                | The more I 
                  worked on the machine, the more noise it 
                  made and the slower it got. I had 
                  to act fast. |  | 
      
      The sickly machine, naturally, turned out to be the SQL erver system 
        I was there to upgrade. I pulled up to the machine in a painfully uncomfortable 
        rolling office chair, a place I would occupy for the next several hours. 
        I was miraculously able to log in and navigate the directory structure, 
        though the machine was crawling. The first thing I discovered was that 
        there was no configured RAID (Redundant Array of Inexpensive Disks), hardware, 
        software or otherwise. They’d gone with the antithesis of RAID—SLED (Single 
        Large Expensive Disk). The drive was partitioned with C: and D: drives. 
        The master database was set up on C: and all the user databases on D:. 
        SQL Server seemed to be running fine but I knew it was only a matter of 
        time—potentially only minutes—before we might never boot again. The more 
        I worked on the machine, the more noise it made and the slower it got. 
        I had to act fast.
      
      
If at First You Don’t Succeed…
        They’d purchased a new server that was going to be the recipient 
        of the upgraded databases, once the conversion was finished on the now-crashing 
        server. Both were running NT 4.0 with Service Pack 6. They’d already installed 
        SQL 7.0 with SP2 on the new server.
      I faced several alternatives. One option would be to attempt a machine-to-machine 
        upgrade by connecting, via the network with the upgrade wizard, to the 
        old SQL Server. Another course of action: Remove SQL 7.0 on the new server, 
        install SQL 6.5, copy over the database files to the new server and perform 
        a single-machine upgrade. Either option would require pulling very large 
        amounts of data—a gig and change—from the ailing system. I decided to 
        walk down the machine-to-machine upgrade path first. About 10 minutes 
        into the process, when I was just starting to believe it would work, the 
        old server hung. I waited for any sign of life, then decided to take the 
        only option left and bounce the old server. Another small miracle occurred 
        when it actually rebooted successfully and the SQL services started! So, 
        scratch plan A and move to plan B, the single-machine upgrade.
      I’d learned a trick when moving a full SQL Server directly to a new machine 
        without having to restore databases individually, one of which I’d employed 
        several times in the past. The procedure’s simple: Install the same version 
        of SQL Server on another machine. Stop the SQL services on both machines. 
        Copy all the SQL database and log files, like master.dat, into the same 
        location from the source server to the destination server. If master.dat 
        resided on C:\MSSQL\Data, then that’s where it has to go on the new server. 
        The master database contains information about all the databases, users 
        and logins on the server. With all the files in place, take the old server 
        offline, give the new server the same name as the old box and change the 
        IP address. Restart the SQL services on the new machine. If everything 
        was done correctly, the new SQL Server would be identical to the old SQL 
        Server. This was my new plan of attack.
      
      
Crash No. 2
        I uninstalled SQL 7.0 and installed SQL 6.5 on the new server. 
        It had been partitioned identically, so all that remained for me to do 
        was move the data and log files. I connected to the dying server by mapping 
        drives to the administrative shares on the C: and D: drives and began 
        copying the files. In hindsight, I could have attempted to zip the files, 
        but that would require even more HD activity. After another 45 minutes 
        of copying, the old server hung again. Arggghhh!
      
      
Please Tell Me You Made the Backup
        This time I opted to restore from backup tape. I’d made the network 
        manager at the client site promise me he’d make backups of everything, 
        including the raw data files, before I began the upgrade. He’d stopped 
        the SQL services so the files wouldn’t be open and subsequently skipped 
        during the backup process. I was able to pull the backed-up files from 
        the tape and restore them on the new SQL server, which was now a SQL 6.5 
        machine. I started the services after powering down the old machine for 
        the final time, and everything started successfully. All that remained 
        was to reinstall SQL 7.0 and complete the upgrade process for all the 
        databases. Thankfully, it went off without a hitch.
      As it was nearly 3 a.m. before I finished, I headed back to the hotel. 
        I was so geared up, I did actually study for a few minutes. The users 
        showed up the next morning, rested and eager to experience the promised 
        performance gains. I made sure they were all content and then headed home 
        for a relaxing weekend, remembering to thank the real heroes who’d saved 
        the day with one backup tape.
      MTBF Means Just That
        Every hard drive comes with an MTBF value. MTBF stands for Mean 
        Time Between Failures and is measured in hours. Though today’s hard drives 
        have values in the hundreds of thousands of hours, just knowing the number 
        exists is food for thought.
      Remember, though, that there are many resources and technologies out 
        there to prevent hard-drive catastrophes, or to at least provide the minimum 
        downtime. Thus, there are really no excuses for not protecting your data. 
        Technologies like Intelli- Mirror, clustering, Remote Installation Services, 
        disk imaging and single-disk recovery procedures that come with many backup 
        applications offer varying levels of protection. 
      There are also companies that provide services to yank data off drives 
        seemingly beyond repair. Many of these resources are expensive, however, 
        and not all companies see the value in investing in them. Having seen 
        my share of crashes, and with the plunging prices of hard drives, I’d 
        recommend at the very least using the software mirroring available with 
        Win2K in conjunction with a solid tape backup plan. In the end, the time 
        and cost involved with rebuilding and restoring a server, especially if 
        there’s significant data loss, would likely pay for the ultimate addition 
        to my server family: a twin pair of quad Xeon, load-balanced, RAID 10, 
        hot-swappable cluster servers with a solid tape backup plan.