When this company moved its financial Web server platform from Unix to NT, everything worked faster. Could it really be that easy?
Taking Stock of NT
When this company moved its financial Web server platform from Unix to NT, everything worked faster. Could it really be that easy?
- By Jon Wagner
- December 01, 1998
Reality Online is a Reuters company that provides financial
data to Internet sites all over the world. Its showcase
site, www.moneynet.com, delivers stock quotes, company
news, a portfolio tracker, and more. Every Web page must
integrate data from different sources and format it with
the look and feel of each of the partner sites. Creating
MoneyNet was a long project, and it taught us many lessons.
When I first joined the Internet team at Reality Online,
MoneyNet delivered 100,000 dynamic pages a day, with a
set of two Sun/Netscape proxy servers and two Web servers.
The application software consisted of CGI programs written
in C++ and a scripting language called Meta-HTML from
Universal Access, Inc. (www.meta-html.com). In order to
connect to our Oracle database and legacy custom data
servers, every page was custom-built. Projects usually
took several months, and even with our best efforts, page
generation was slow.
Our Web site was functional, but we knew it wouldnt
support the traffic predicted by our growth. We thought
about adding more hardware, but when we expanded to eight
Unix servers, we realized that maintenance and operation
of these servers had too much overhead. We couldnt
find tools to keep the code, software, and configuration
of the systems synchronized. Each added server cost us
hours of work every time we updated code, fixed a problem,
or changed our configuration.
Ambitious Goals
In addition, our engineering team was tired of building
everything by hand in the Unix world. It was as if we
were using handsaws even while we knew about the existence
of a power tool called Active Server Pages. With a little
research, we thought that the Windows NT Server platform
would make Internet development easier. After a pilot
project, we convinced management to allow us to migrate
the entire site to NT. We came up with some very ambitious
goals:
- Boost page generation speed by five to 10 times.
- Cut hardware expenditures by 50 percent by improving
throughput and using less-expensive hardware.
- Improve quality while cutting development time by
two-thirds.
- Expand capacity from 100,000 to 1 million pages a
day.
- Increase Web server availability to 99.8 percent.
This translated to just one hour of total downtime per
month, compared to five hours of upgrade time and several
hours of unplanned outages per month.
We designed all of our goals to make our business group
profitable. If we could make our faster, better,
cheaper motto a reality, then we knew wed
be the industry leader.
Of course, building a fast, reliable, and scalable site
with only three Internet engineers would take some planning.
We spent the first month strictly researching products,
tools, and processesbefore writing a single line
of code. Then we started with one small object.
|
Figure 1. The MoneyNet
network architecture. For scalability and redundancy,
the MoneyNet network has multiple servers for each
set of functionality. End users (browsers and data
partners) connect to an array of six proxy servers
that handle data caching and load balancing for the
Web farm. An ISAPI filter (available exclusively online)
directs each request to one of 10 Web servers running
IIS 3.0 and Active Server Pages. To build a Web page
or deliver data, each IIS server has a set of ActiveX
objects that can communicate with the back-end relational
database via ODBC or to legacy data servers via TCP.
The firewall between the proxy servers and the Web
servers prevents malicious traffic from reaching the
Web farm or data center. |
Kick-off in High Gear
The first step in migrating the site to NT was to provide
access to our stock quote data. On the Unix system, we
had to write a separate program for each page that used
quotes. To avoid this problem, we encapsulated this processing
within a single ActiveX object. (See, Using
the QuoteServer Object.) The QuoteServer object
retrieved stock information from our back-end servers
and exposed the data through an automation interface.
We wrote all of our ActiveX objects as in-process DLLs
with Visual C++ 5.0, which gave us a significant performance
boost over the CGI scripts running on our Unix systems.
The first version of the object dropped the page generation
time for a quote page from more than two seconds to 0.4
seconds.
When we looked more closely at the performance profile
of our object, we realized that we could squeeze a little
more speed out of it. The back-end quote servers were
Unix machines that spawned a process every time a connection
was made. So most of our time was spent building the TCP
connection and creating a Unix process. Under heavy loads
and stress tests, this delay became significant.
We borrowed the concept of connection pooling from ODBC
3.0. Connection pooling allows server connections to remain
open when not in use. If another request is made to the
same server, the drivers reuse the connection rather than
building another connection. If a particular connection
is idle for too long, the driver automatically closes
it. By adding our own connection pool to the QuoteServer
object and supporting multiple requests on the server,
we were able to maximize our throughput during heavy loads.
In our simulations, throughput increased from eight to
40 requests per second and page generation dropped even
further to just over 0.2 seconds. After seeing these performance
numbers, we knew that NT was the way to go.
Using
the QuoteServer Object |
Creating an ActiveX object
to access our stock quote data really
simplified Web page development. The QuoteServer
object hides the TCP socket-level code
from the user interface. A typical Active
Server Page to display a stock quote looks
like this:
<%
create a quote object and fetch
the data
for the requested symbol
Set oQuote = Server.CreateObject ("Reality.Server.Quote")
oQuote.GetSymbol (Request ("SYMBOL"))
now go
and display the quote data
%>
<TABLE BORDER=0>
<TR>
<TH>Symbol</TH><TH>Description</TH>
<TH>Price</TH><TH>Change</TH>
</TR>
<TR>
<TD><%
= oQuote ("SYMBOL") %></TD>
<TD><%
= oQuote ("DESCRIPTION") %></TD>
<TD><%
= oQuote ("PRICE") %></TD>
<TD><%
= oQuote ("CHANGE") %></TD>
</TR>
</TABLE>
A recent project at Reality is to provide
similar objects to partner Web sites.
These objects make it easy to integrate
stock quotes, company news, and historical
pricing into any ASP site. (Check out
"Partnering with Moneynet"
from www.moneynet.com.)
Jon Wagner
|
|
|
Doing More with Less (Work)
Although the Web site had a standard set of functionality
and pages, a main product was the sale of the pages to
other financial sites. Of course, each partner site wanted
a slightly different look and feel, or a small change
in behavior. Most of the requests we received involved
changing the colors or images on the pages. We didnt
want to write separate copies of all of our pages for
each partner (over 40 at the time), so we needed to find
a way to make a single set of pages look 40 different
ways.
To handle this level of customizability, we moved all
configuration information into a set of configuration
files for each partner. We could have used a database,
but didnt want to incur a performance penalty on
each page, and the data changed fairly infrequently. Upon
Web server start-up, the code in global.asa loaded the
information into an application-scope dictionary object.
The configuration data was then in memory for all of the
ASP pages and could be accessed quickly.
In addition, we had a set of default settings for use
when a partner didnt specify an override.
Trial by Fire
After three months of coding, we had most of the site
functionality ported, and our two NT servers went on-line
in an experimental state. We projected another two months
of cleanup and testing before we would start moving traffic
over to NT.
Fortunately, the stock market is a crazy thing. On Monday,
October 27, 1997, the market dropped 554 points. This
was the first big market crash that had happened after
the Web had become a popular source of news and information
for millions of people, and all of the financial information
providers were swamped.
We needed extra capacity very quickly. Without testing
our two NT Web servers, we diverted half of the traffic
to NT. Surprisingly, they performed flawlessly (albeit
at 100 percent CPU during market hours), and our site
handled 1.7 million pages that day. That night, we added
two more NT Web servers and we were one of the few financial
Web sites that made it through Tuesday with acceptable
response times.
Our operators installed the additional servers in a few
hours, possible only because we had documented our installation
procedures. The operators knew exactly which software
to install, in the proper order, with all of the proper
settings. These servers continue to run today in their
initial configuration.
Building
a High-Performance Web Site |
Building a fast, scalable
site takes a lot of planning and discipline.
Here are some tips to get you started:
- Design for reliability. Each piece
of the system should assume everything
else is broken.
- Document your set-up and configuration.
When you expand, youll need
to duplicate your set-up.
- Use Visual Basic for easy development,
but C++ for performance-critical code.
- Analyze every piece of the system.
Dont assume that any existing
software, data, or server can reliably
handle the load.
- Stress-test everything. Be sure
to perform both individual and integrated
tests.
Jon Wagner
|
|
|
The Inhuman Element
For NT Web servers we had chosen Dell PowerEdge 2200
dual-Pentium 200 servers with 256M of memory and 6G hard
drives. Since the machines were simply page generators
that didnt store any data, we saved money by purchasing
these lower-end servers without RAID or other hardware
fault tolerance. We opted to scale wide, with redundancy
at the machine level rather than the component level.
Each NT server outperformed two of our Sun Ultra 2 servers.
Traffic on the Internet tends to fill all available server
capacity. This was true with our NT servers. As we expanded
our capacity and increased speed, we gained even more
traffic. Soon we had expanded our NT Web farm to six servers.
Such growth could have caused us some maintenance headaches.
Our development group was running on Internet time, with
a new software release just about every week. Fortunately,
our research had led us to Microsofts Content Replication
System (now Content Deployment System in Site Server 3.0).
This tool allows you to synchronize files among multiple
Web servers.
We had divided the development environment into three
segments. Development began on the alpha Web server, where
multiple projects were edited concurrently. When a project
was completed, we deployed its pages to the beta Web server.
The beta site was in a state of code freeze, and our QA
department verified the software there. Finally, we would
use the Content Deployment System to push all of the pages
simultaneously to all of the production servers.
The deployment system removed almost all operator error
from the release process. On our Unix systems, a release
cycle consisted of manually copying files from the beta
server to multiple Web serversan error-prone approach.
Under NT, the administrator simply clicks on a link and
waits a few minutes. The deployment system adds and updates
changed files and deletes old files.
Under the Rug
Not everything went smoothly with NT. The site was fast,
but we experienced a few stability problems. Periodically,
the inetinfo process would crash or we would
lose database connectivity. As the site grew from 100,000
pages a day to half a million, the problems worsened.
Fortunately, we had designed an effective monitoring
system with NuViews ManageX 2.0 (now a product of
Hewlett-Packard). ManageX comes with several scripts to
monitor performance counters and allows you to write your
own VBScript monitors. We used the built-in service monitor
to watch the WWW Publishing Service. The script starts
any auto-start services that arent running.
By executing this script every minute, we could minimize
the downtime of our servers. With ManageX, we also performed
full diagnostics of all of our ActiveX objects and back-end
servers, so we could easily detect when there was a problem
with any of our systems.
A
ManageX 2.0 Script to Restart Stopped
Services |
Its difficult to write
a monitoring system that takes corrective
action. Automatic actions can cause more
harm than good if applied in the wrong
situation. However, its almost always
a good idea to have your Automatic services
running. We run this ManageX script every
minute to make sure that all of our services
are running.
VB Script
that checks to see if a service denoted
as
autostart or automatic is running.
If not it attempts
to restart the service
Get all service start types and
states.
These will be in the same order.
Set types = ObjectManager.CreateExpression
;
([Services]:Win32.Start
Type;*)
Set states = ObjectManager.CreateExpression
;
(\[Services\]:Win32.State;*)
The number
of services found
nServices = types.Count
For each
service that we found
For i = 0 To (nServices - 1)
get
startup string
Set sStartup
= types.item(i)
If sStartup.value
= Auto Start Then
If Auto Start, it should be running
get its current state
set sState = states.item(i)
If the Service is down.
If sState.value = Not running
Then
put the service name in quotes
str = chr(34) + sStartup.instance +
chr(34)
perform a net start
server.execute(cmd /c net start
+ str)
End If
End If
Next
Heres how it
works:
- The ManageX ObjectManager object
is a window into performance counters
and other system parameters. We first
get a list of types of services and
the states theyre in.
- For each service, we check to see
if its marked as Auto
Start in the service database.
- We think that Automatic services
should always be running. If theyre
not...
-
We use the net start command
to start the service.
In reality, we also send a page to
our operators if a service isnt
running. You can add or customize this
to do more error checking or send out
custom alerts.
Jon Wagner
|
|
|
The extreme loads on our Web servers were causing up
to a dozen restarts a day across our Web farm. I spent
days, perhaps weeks, reading crash dumps and stack traces,
and I couldnt find any clues. So, if we couldnt
fix the problem, we would have to hide it from the outside
world. We came up with a clever idea that would improve
our load balancing as well as give us better fault tolerance.
First, we installed Microsoft Proxy Server 2.0 on three
servers in front of our Web farm. These servers replaced
our Netscape proxies and actually improved our end-user
perceived performance. Then we created an ISAPI filter
that would spread traffic evenly across the Web servers.
Heres the clever part: The filter could detect when
a Web server was down or responding slowly. If a server
became unresponsive, traffic would be routed away from
it within 10 seconds. By keeping traffic away from troubled
servers The ISAPI filter helped us reach 99.9 percent
availability. We even included a manual cutoff so we could
perform Web server maintenance with no outages.
An
ISAPI Filter for Load Balancing with MS
Proxy Server |
This type of filter allowed
us to put three proxy servers in front
of our six Web servers and use the proxy
servers to do basic load distribution.
By changing the Host header of each URL
request, we can tell Proxy Server to fetch
the request from the server with the least
amount of load.
To create the ISAPI
Filter project:
- Start Visual C++ 5.0 and choose
File | New.
- Under Projects, choose ISAPI Extension
Wizard.
- Give the project a name and press
Next.
- Select Generate a Filter object
and deselect Server Extension.
- Press the Next button.
- Select High Priority, both secure
and non-secure, and Post-Preprocessing.
- Press Finish to create the skeleton.
- Add the following code to the OnPreprocHeaders
method.
DWORD CProxyFilter::OnPreprocHeaders
(CHttpFilterContext*
pCtxt,
PHTTP_FILTER_PREPROC_HEADERS
pHeaderInfo)
{
// first get
the host that the client wants to get
to CString
szHost;
DWORD dwBufSize
= 0;
pHeaderInfo->GetHeader
(pCtxt->m_pFC, _T (HOST:),
NULL, &dwBufSize);
LPTSTR pszHost
= szHost.GetBuffer (dwBufSize);
pHeaderInfo->GetHeader
(pCtxt->m_pFC, _T (HOST:),
pszHost, &dwBufSize);
szHost.ReleaseBuffer();
// save the original host name as CLIENT-HOST:
// in case
the Web server needs it
pHeaderInfo->SetHeader
(pCtxt->m_pFC, _T
(CLIENT-HOST:),
(LPTSTR)(LPCTSTR)szHost);
// do rudimentary load balancing here
// you can
customize the algorithm
static LONG
nWhichServer = 0;
if (nWhichServer
& 1)
szHost = _T (internalweb1);
else
szHost = _T (internalweb2);
::InterlockedIncrement
(&nWhichServer);
// set the host header
pHeaderInfo->SetHeader
(pCtxt->m_pFC, _T (HOST:),
(LPTSTR)(LPCTSTR)szHost);
return SF_STATUS_REQ_NEXT_NOTIFICATION;
}
It works like this:
- When an HTTP request comes to the
proxy server, it first goes to IIS.
- Proxy Server 2.0 then has an ISAPI
filter that redirects the request
to the Proxy Server fetch and caching
engine.
- This filter replaces the HTTP Host
request header before the Proxy Server
sees it. This tricks Proxy Server
into fetching the URL from whatever
server you specify.
You can make the balancing algorithm
as complicated as you need it. We have
it detect downed servers and shut traffic
off from the troubled servers.
When you replace the Host header, the
Web server wont be able to generate
self-referencing URLs with the SERVER_NAME
CGI variable. Thats why we stuff
the Host header into Client-Host. With
a simple change, you can use this same
ISAPI filter to put the Client-Host
back into the Host header before IIS
or ASP see the header. Just remove the
load balancing and switch the use of
HOST: and CLIENT-HOST:
in the code.
Jon Wagner
|
|
|
Eventually, we discovered that our Oracle database had
a bug that caused dynamic queries to leak memory. During
heavy use, it would spend minutes swapping, causing the
client Web servers to have problems in the database drivers.
Fortunately, we were able to resolve the issue through
several days of work with Oracle technical support. The
Web servers became rock-solid and faster than ever.
Additional
Information |
The products used in this project include
the following:
|
|
|
What Can You Learn?
With a lot of hard work and a few clever ideas, we actually
achieved all of our goals. The Web site is incredibly
stable and is growing every day. It delivered over 25
million dynamic pages and over 50 million total hits during
June 1998, with a peak rate of 1.1 million pages in one
day.
What can you learn from this? NT can be as reliable and
scalable as Unix. I certainly recommend it as the platform
of choice for Web applications. And using ASP and ActiveX
is an effective way to build a Web site. Last, being among
the biggest and fastest requires planning, discipline,
and creativity.