The Excedrin Exchange Upgrade
You know what they say about the best laid plans...
- By George Boswell
- October 01, 2003
Upgrading Exchange 5.5 SP4 from NT 4.0 to Windows 2000 should have been a no-brainer. Like all good network professionals, I thoroughly researched my plan. I had pages of sequential steps covering uninstalling the Network Associates Management Agent, modifying outdated firewall configurations, backing up Exchange databases and so on, through the final creation of the Win2K ERD.
Things proceeded smoothly until step seven, the actual OS upgrade to Win2K Server. The first glitch was minor; Win2K didn’t have an upgrade driver for the add-on NIC. After the upgrade, I used the NIC CD to install the driver, configured TCP/IP and rebooted again, just for good measure.
At this point, I got my first hint of worse to come—in the form of a “One service or driver failed to start” message: “esserver.exe could not start because estier2.dll could not be found.” The Microsoft Knowledge Base was no help. My other servers, both NT and Win2K, didn’t have this .dll. At this point, I didn’t know what the consequences of this error were, but I didn’t want to proceed until I fixed it. Eventually I found the .dll available for download from a Web site, but esserver.exe still didn’t start—this time because of a missing .dll named esshared.dll. Again, there was no help from the KB and nothing on the Internet. Since there were no overt problems with the server, I decided to defer solving this issue.
I finished the upgrade, opened the necessary ports on the firewall and made the ERD. Done. I tested the services, sending e-mail to and from the Internet. Everything seemed to be working fine. Almost as an afterthought, I tested Outlook Web Access (OWA). Oops! “Cannot find server or DNS Error Internet Explorer.” Oh, well, I figured it was a minor configuration error, something that hadn’t transferred during the upgrade process. But I wasn’t worried because I’d previously documented the OWA configuration. Since the primary Exchange services were fine, I called it day, intending to finish up the next day.
The next day, though, things got worse. I couldn’t get OWA to work. After four hours of struggling, the decision was made to call Microsoft Support Services and initiate a paid support call. Microsoft would really earn its money on this one.
Wesley initially took the call. He first thought it might be a simple DNS error due to the servers’ location on the DMZ. We tried a HOST file to force DNS resolution, but there was no change. Next, we turned off SSL requirements. This resulted in a 500 error when trying to access OWA.
We verified the existence of logon.asp and default.htm. We checked the ISAPI filter in the default site. It was running low (a good thing). We ran an .asp page, which also worked.
Next, we installed the fixes from KB 289606 and KB 313576. They installed with no problem. But still no OWA. At this point, out of desperation, I related the previous problem with esserver.exe. Wesley thought this was significant but couldn’t find much information on the subject.
Pounding through the event log, I found Event ID 2 for Source IISLOG (Unable to create log directory) and Event ID 2506 for Source Server (IRPStackSize problem). We fixed the IISLOG problem by re-registering iislog.dll; but still no OWA.
At this point, I noticed the Microsoft Distributed Transaction Coordinator (MS DTC) service wasn’t started and couldn’t be started. Since this service is required for OWA, it made sense to pursue this path.
Wesley conferenced in another tech—Paul—from the Win2K Server performance team. Paul initially thought we might need the DVS team (they handle MTS, COM and ASP). Eventually, he suggested we perform the procedure in KB 279786, “How to reinstall MS DTC for a Nonclustered Server.” This is a lengthy and exacting process, including at least 10 services that need to be stopped and set to manual before even beginning the reinstall of MS DTC. The reinstall succeeded. After reboot, the MS DTC service was gone and replaced by the Distributed Transaction Coordination service (DTC), which started with no errors.
Next, Paul was able to provide a fix for the Event ID 2506 errors. This involved a simple registry hack. Another reboot, and all Exchange services were running, DTC was running, and there were no 2506 errors. However, OWA still wasn’t working.
I was growing desperate at this point, having invested another nine hours on the problem with no end in sight. Just to have something to do, I was reading out loud the description of various errors and warnings in the log. Most of these events were obscure, seeming to be generic and pointing to nothing in particular.
Suddenly Paul asked me to repeat the last event description. Something jogged his memory of a case he’d handled several years earlier. He was able to access a non-public KB article on his previous case, which suggested reinstalling IIS and OWA was necessary. In hindsight, I should have uninstalled IIS prior to attempting the OS upgrade on the OWA server. Since this wasn’t Paul’s area of expertise, Wesley now conferenced in the third tech, with a name pronounced, “Young.”
Young provided a 12-step process:
Back up SSL certificates. (Oops! It was gone. Another thing I should have done prior to upgrading the OS.)
Delete IWAM/IUSR accounts
Install IIS update
Install Exchange 5.5 SP4
Eureka! OWA was finally working again. But we weren’t done yet. Next, I installed Exchange 5.5 post-SP4 patches.
I wanted to see if OWA would work using SSL; I turned it on, and we started getting DNS errors again. To troubleshoot, we began by checking the firewall logs to confirm an attempted connection on port 443. However, the w3svc log showed no access on port 443. We tested from a second server (this was my test server; I’d tested my upgrade procedure before trying it on the production server) on the DMZ and still got the DNS error. I ran netstat on the OWA server and confirmed it was listening on port 443. Next, I checked to make sure the SSLfilt, compression, and md5filt ISAPI filters were there and running with appropriate authority. I downloaded TCPview and confirmed that inetinfo was the only process using port 443.
Last, Paul issued a new certificate to me. I installed it on the OWA server, we tested again, and still it returned the DNS error. Now it was time for the fourth and last tech, so Paul conferenced in Farida.
While Paul brought her up to date, I examined the event logs again. I found two events, both pertaining to Source channel. Both errors indicate problems with a certificate. Farida issued yet another new certificate for the OWA server, but still it wouldn’t work with SSL. So she issued a new certificate for the test server. The test server worked with SSL, so now we knew it wasn’t the certificate, but something wrong with the OWA server.
At this point, we remembered the deferred problem with esserver.exe.
Farida knew this executable was somehow involved in the certificate and
security process. She suggested upgrading the OWA server to Internet Explorer
6, which I did.
Using the certificate from Farida, OWA now worked using SSL. Farida provided a cert utility to find and recover the missing private key for the OWA server. Unfortunately, it was gone. Backing up the private key and certificate is yet another thing I should have done prior to upgrading the OWA server.
Finally, the upgrade was complete. We still needed to purchase and install a new certificate. In the meantime, (with management approval) I turned off SSL and went home.