When this company moved its financial Web server platform from Unix to NT, everything worked faster. Could it really be that easy?

Taking Stock of NT

When this company moved its financial Web server platform from Unix to NT, everything worked faster. Could it really be that easy?

Reality Online is a Reuters company that provides financial data to Internet sites all over the world. Its showcase site, www.moneynet.com, delivers stock quotes, company news, a portfolio tracker, and more. Every Web page must integrate data from different sources and format it with the look and feel of each of the partner sites. Creating MoneyNet was a long project, and it taught us many lessons.

When I first joined the Internet team at Reality Online, MoneyNet delivered 100,000 dynamic pages a day, with a set of two Sun/Netscape proxy servers and two Web servers. The application software consisted of CGI programs written in C++ and a scripting language called Meta-HTML from Universal Access, Inc. (www.meta-html.com). In order to connect to our Oracle database and legacy custom data servers, every page was custom-built. Projects usually took several months, and even with our best efforts, page generation was slow.

Our Web site was functional, but we knew it wouldn’t support the traffic predicted by our growth. We thought about adding more hardware, but when we expanded to eight Unix servers, we realized that maintenance and operation of these servers had too much overhead. We couldn’t find tools to keep the code, software, and configuration of the systems synchronized. Each added server cost us hours of work every time we updated code, fixed a problem, or changed our configuration.

Ambitious Goals

In addition, our engineering team was tired of building everything by hand in the Unix world. It was as if we were using handsaws even while we knew about the existence of a power tool called Active Server Pages. With a little research, we thought that the Windows NT Server platform would make Internet development easier. After a pilot project, we convinced management to allow us to migrate the entire site to NT. We came up with some very ambitious goals:

  • Boost page generation speed by five to 10 times.
  • Cut hardware expenditures by 50 percent by improving throughput and using less-expensive hardware.
  • Improve quality while cutting development time by two-thirds.
  • Expand capacity from 100,000 to 1 million pages a day.
  • Increase Web server availability to 99.8 percent. This translated to just one hour of total downtime per month, compared to five hours of upgrade time and several hours of unplanned outages per month.

We designed all of our goals to make our business group profitable. If we could make our “faster, better, cheaper” motto a reality, then we knew we’d be the industry leader.

Of course, building a fast, reliable, and scalable site with only three Internet engineers would take some planning. We spent the first month strictly researching products, tools, and processes—before writing a single line of code. Then we started with one small object.

Figure 1. The MoneyNet network architecture. For scalability and redundancy, the MoneyNet network has multiple servers for each set of functionality. End users (browsers and data partners) connect to an array of six proxy servers that handle data caching and load balancing for the Web farm. An ISAPI filter (available exclusively online) directs each request to one of 10 Web servers running IIS 3.0 and Active Server Pages. To build a Web page or deliver data, each IIS server has a set of ActiveX objects that can communicate with the back-end relational database via ODBC or to legacy data servers via TCP. The firewall between the proxy servers and the Web servers prevents malicious traffic from reaching the Web farm or data center.

Kick-off in High Gear

The first step in migrating the site to NT was to provide access to our stock quote data. On the Unix system, we had to write a separate program for each page that used quotes. To avoid this problem, we encapsulated this processing within a single ActiveX object. (See, “Using the QuoteServer Object.”) The QuoteServer object retrieved stock information from our back-end servers and exposed the data through an automation interface.

We wrote all of our ActiveX objects as in-process DLLs with Visual C++ 5.0, which gave us a significant performance boost over the CGI scripts running on our Unix systems. The first version of the object dropped the page generation time for a quote page from more than two seconds to 0.4 seconds.

When we looked more closely at the performance profile of our object, we realized that we could squeeze a little more speed out of it. The back-end quote servers were Unix machines that spawned a process every time a connection was made. So most of our time was spent building the TCP connection and creating a Unix process. Under heavy loads and stress tests, this delay became significant.

We borrowed the concept of connection pooling from ODBC 3.0. Connection pooling allows server connections to remain open when not in use. If another request is made to the same server, the drivers reuse the connection rather than building another connection. If a particular connection is idle for too long, the driver automatically closes it. By adding our own connection pool to the QuoteServer object and supporting multiple requests on the server, we were able to maximize our throughput during heavy loads. In our simulations, throughput increased from eight to 40 requests per second and page generation dropped even further to just over 0.2 seconds. After seeing these performance numbers, we knew that NT was the way to go.

Using the QuoteServer Object
Creating an ActiveX object to access our stock quote data really simplified Web page development. The QuoteServer object hides the TCP socket-level code from the user interface. A typical Active Server Page to display a stock quote looks like this:

<%
‘ create a quote object and fetch the data
‘ for the requested symbol
Set oQuote = Server.CreateObject ("Reality.Server.Quote")
oQuote.GetSymbol (Request ("SYMBOL"))

‘ now go and display the quote data
%>

<TABLE BORDER=0>
<TR>
     <TH>Symbol</TH><TH>Description</TH>
     <TH>Price</TH><TH>Change</TH>
</TR>
<TR>
     <TD><% = oQuote ("SYMBOL") %></TD>
     <TD><% = oQuote ("DESCRIPTION") %></TD>
     <TD><% = oQuote ("PRICE") %></TD>
     <TD><% = oQuote ("CHANGE") %></TD>
</TR>
</TABLE>

A recent project at Reality is to provide similar objects to partner Web sites. These objects make it easy to integrate stock quotes, company news, and historical pricing into any ASP site. (Check out "Partnering with Moneynet" from www.moneynet.com.)
Jon Wagner

Doing More with Less (Work)

Although the Web site had a standard set of functionality and pages, a main product was the sale of the pages to other financial sites. Of course, each partner site wanted a slightly different look and feel, or a small change in behavior. Most of the requests we received involved changing the colors or images on the pages. We didn’t want to write separate copies of all of our pages for each partner (over 40 at the time), so we needed to find a way to make a single set of pages look 40 different ways.

To handle this level of customizability, we moved all configuration information into a set of configuration files for each partner. We could have used a database, but didn’t want to incur a performance penalty on each page, and the data changed fairly infrequently. Upon Web server start-up, the code in global.asa loaded the information into an application-scope dictionary object. The configuration data was then in memory for all of the ASP pages and could be accessed quickly.

In addition, we had a set of default settings for use when a partner didn’t specify an override.

Trial by Fire

After three months of coding, we had most of the site functionality ported, and our two NT servers went on-line in an experimental state. We projected another two months of cleanup and testing before we would start moving traffic over to NT.

Fortunately, the stock market is a crazy thing. On Monday, October 27, 1997, the market dropped 554 points. This was the first big market crash that had happened after the Web had become a popular source of news and information for millions of people, and all of the financial information providers were swamped.

We needed extra capacity very quickly. Without testing our two NT Web servers, we diverted half of the traffic to NT. Surprisingly, they performed flawlessly (albeit at 100 percent CPU during market hours), and our site handled 1.7 million pages that day. That night, we added two more NT Web servers and we were one of the few financial Web sites that made it through Tuesday with acceptable response times.

Our operators installed the additional servers in a few hours, possible only because we had documented our installation procedures. The operators knew exactly which software to install, in the proper order, with all of the proper settings. These servers continue to run today in their initial configuration.

Building a High-Performance Web Site
Building a fast, scalable site takes a lot of planning and discipline. Here are some tips to get you started:
  • Design for reliability. Each piece of the system should assume everything else is broken.
  • Document your set-up and configuration. When you expand, you’ll need to duplicate your set-up.
  • Use Visual Basic for easy development, but C++ for performance-critical code.
  • Analyze every piece of the system. Don’t assume that any existing software, data, or server can reliably handle the load.
  • Stress-test everything. Be sure to perform both individual and integrated tests.

Jon Wagner

The Inhuman Element

For NT Web servers we had chosen Dell PowerEdge 2200 dual-Pentium 200 servers with 256M of memory and 6G hard drives. Since the machines were simply page generators that didn’t store any data, we saved money by purchasing these lower-end servers without RAID or other hardware fault tolerance. We opted to scale wide, with redundancy at the machine level rather than the component level. Each NT server outperformed two of our Sun Ultra 2 servers.

Traffic on the Internet tends to fill all available server capacity. This was true with our NT servers. As we expanded our capacity and increased speed, we gained even more traffic. Soon we had expanded our NT Web farm to six servers.

Such growth could have caused us some maintenance headaches. Our development group was running on Internet time, with a new software release just about every week. Fortunately, our research had led us to Microsoft’s Content Replication System (now Content Deployment System in Site Server 3.0). This tool allows you to synchronize files among multiple Web servers.

We had divided the development environment into three segments. Development began on the alpha Web server, where multiple projects were edited concurrently. When a project was completed, we deployed its pages to the beta Web server. The beta site was in a state of code freeze, and our QA department verified the software there. Finally, we would use the Content Deployment System to push all of the pages simultaneously to all of the production servers.

The deployment system removed almost all operator error from the release process. On our Unix systems, a release cycle consisted of manually copying files from the beta server to multiple Web servers—an error-prone approach. Under NT, the administrator simply clicks on a link and waits a few minutes. The deployment system adds and updates changed files and deletes old files.

Under the Rug

Not everything went smoothly with NT. The site was fast, but we experienced a few stability problems. Periodically, the “inetinfo” process would crash or we would lose database connectivity. As the site grew from 100,000 pages a day to half a million, the problems worsened.

Fortunately, we had designed an effective monitoring system with NuView’s ManageX 2.0 (now a product of Hewlett-Packard). ManageX comes with several scripts to monitor performance counters and allows you to write your own VBScript monitors. We used the built-in service monitor to watch the WWW Publishing Service. The script starts any “auto-start” services that aren’t running. By executing this script every minute, we could minimize the downtime of our servers. With ManageX, we also performed full diagnostics of all of our ActiveX objects and back-end servers, so we could easily detect when there was a problem with any of our systems.

A ManageX 2.0 Script to Restart Stopped Services
It’s difficult to write a monitoring system that takes corrective action. Automatic actions can cause more harm than good if applied in the wrong situation. However, it’s almost always a good idea to have your Automatic services running. We run this ManageX script every minute to make sure that all of our services are running.

‘ VB Script that checks to see if a service denoted as
‘ autostart or automatic is running. If not it attempts
‘ to restart the service
‘ Get all service start types and states.
‘ These will be in the same order.

Set types = ObjectManager.CreateExpression ;
                    (“‘[Services]:Win32.Start Type;*’”)
Set states = ObjectManager.CreateExpression ;
                    (“‘\[Services\]:Win32.State;*’”)

‘ The number of services found
nServices = types.Count

‘ For each service that we found
For i = 0 To (nServices - 1)
     ‘ get startup string
     Set sStartup = types.item(i)
     If sStartup.value = “Auto Start” Then
          ‘ If Auto Start, it should be running
          ‘ get its current state
          set sState = states.item(i)

          ‘ If the Service is down.
          If sState.value = “Not running” Then
               ‘ put the service name in quotes
               str = chr(34) + sStartup.instance + chr(34)

               ‘ perform a net start
               server.execute(“cmd /c net start “ + str)
          End If
     End If
Next

Here’s how it works:

  1. The ManageX ObjectManager object is a window into performance counters and other system parameters. We first get a list of types of services and the states they’re in.
  2. For each service, we check to see if it’s marked as “Auto Start” in the service database.
  3. We think that Automatic services should always be running. If they’re not...
  4. …We use the net start command to start the service.

In reality, we also send a page to our operators if a service isn’t running. You can add or customize this to do more error checking or send out custom alerts.

Jon Wagner

The extreme loads on our Web servers were causing up to a dozen restarts a day across our Web farm. I spent days, perhaps weeks, reading crash dumps and stack traces, and I couldn’t find any clues. So, if we couldn’t fix the problem, we would have to hide it from the outside world. We came up with a clever idea that would improve our load balancing as well as give us better fault tolerance.

First, we installed Microsoft Proxy Server 2.0 on three servers in front of our Web farm. These servers replaced our Netscape proxies and actually improved our end-user perceived performance. Then we created an ISAPI filter that would spread traffic evenly across the Web servers. Here’s the clever part: The filter could detect when a Web server was down or responding slowly. If a server became unresponsive, traffic would be routed away from it within 10 seconds. By keeping traffic away from troubled servers The ISAPI filter helped us reach 99.9 percent availability. We even included a manual cutoff so we could perform Web server maintenance with no outages.

An ISAPI Filter for Load Balancing with MS Proxy Server
This type of filter allowed us to put three proxy servers in front of our six Web servers and use the proxy servers to do basic load distribution. By changing the Host header of each URL request, we can tell Proxy Server to fetch the request from the server with the least amount of load.

To create the ISAPI Filter project:

  1. Start Visual C++ 5.0 and choose File | New.
  2. Under Projects, choose ISAPI Extension Wizard.
  3. Give the project a name and press Next.
  4. Select Generate a Filter object and deselect Server Extension.
  5. Press the Next button.
  6. Select High Priority, both secure and non-secure, and Post-Preprocessing.
  7. Press Finish to create the skeleton. 
  8. Add the following code to the OnPreprocHeaders method.

DWORD CProxyFilter::OnPreprocHeaders
     (CHttpFilterContext* pCtxt,
     PHTTP_FILTER_PREPROC_HEADERS pHeaderInfo)
{
     // first get the host that the client wants to get to      CString szHost;
     DWORD dwBufSize = 0;
     pHeaderInfo->GetHeader (pCtxt->m_pFC, _T (“HOST:”),
          NULL, &dwBufSize);
     LPTSTR pszHost = szHost.GetBuffer (dwBufSize);
     pHeaderInfo->GetHeader (pCtxt->m_pFC, _T (“HOST:”),
          pszHost, &dwBufSize);
     szHost.ReleaseBuffer();

     // save the original host name as CLIENT-HOST:
     // in case the Web server needs it
     pHeaderInfo->SetHeader (pCtxt->m_pFC, _T
          (“CLIENT-HOST:”),
     (LPTSTR)(LPCTSTR)szHost);

     // do rudimentary load balancing here
     // you can customize the algorithm
     static LONG nWhichServer = 0;
     if (nWhichServer & 1)
          szHost = _T (“internalweb1”);
     else
          szHost = _T (“internalweb2”);
     ::InterlockedIncrement (&nWhichServer);

     // set the host header
     pHeaderInfo->SetHeader (pCtxt->m_pFC, _T (“HOST:”),
     (LPTSTR)(LPCTSTR)szHost);

     return SF_STATUS_REQ_NEXT_NOTIFICATION;

}

It works like this:

  1. When an HTTP request comes to the proxy server, it first goes to IIS.
  2. Proxy Server 2.0 then has an ISAPI filter that redirects the request to the Proxy Server fetch and caching engine.
  3. This filter replaces the HTTP Host request header before the Proxy Server sees it. This tricks Proxy Server into fetching the URL from whatever server you specify.

You can make the balancing algorithm as complicated as you need it. We have it detect downed servers and shut traffic off from the troubled servers.

When you replace the Host header, the Web server won’t be able to generate self-referencing URLs with the SERVER_NAME CGI variable. That’s why we stuff the Host header into Client-Host. With a simple change, you can use this same ISAPI filter to put the Client-Host back into the Host header before IIS or ASP see the header. Just remove the load balancing and switch the use of “HOST:” and “CLIENT-HOST:” in the code.

—Jon Wagner

Eventually, we discovered that our Oracle database had a bug that caused dynamic queries to leak memory. During heavy use, it would spend minutes swapping, causing the client Web servers to have problems in the database drivers. Fortunately, we were able to resolve the issue through several days of work with Oracle technical support. The Web servers became rock-solid and faster than ever.

Additional Information

The products used in this project include the following:

What Can You Learn?

With a lot of hard work and a few clever ideas, we actually achieved all of our goals. The Web site is incredibly stable and is growing every day. It delivered over 25 million dynamic pages and over 50 million total hits during June 1998, with a peak rate of 1.1 million pages in one day.

What can you learn from this? NT can be as reliable and scalable as Unix. I certainly recommend it as the platform of choice for Web applications. And using ASP and ActiveX is an effective way to build a Web site. Last, being among the biggest and fastest requires planning, discipline, and creativity.

Featured