In-Depth
From the Trenches: Have a Crash-Free Day
Heroix RoboMon acts as a beacon for systems administrators whose networks are about to flounder on the rocks.
- By Sandy Burd
- March 01, 2000
Tony Canella didn't want to have to make excuses--not
to anyone. So, if he could keep his network up and running,
crash-free, he could avoid such a situation.
"RoboMon has given me the confidence and flexibility
that I don't have to be in a situation with a crashed
network, in the middle of the day especially." Canella
is the Director of Network Administration at HITT Contracting,
the fifth largest construction firm in the Mid-Atlantic,
with clients such as AOL Corporate Campus in Dulles, Virginia
and GEICO Direct in Fredericksburg, Virginia. HITT has
offices in Fairfax, Atlanta, and Charleston, with 300
users, a third of whom are scattered up and down the coast
using dial-in.
Product
Information |
RoboMon 7.5, $595
to 3,295
Heroix Corporate Headquarters, USA
617-527-1550, 800-229-6500,
www.heroix.com
The higher price includes the base product,
along with management console, event monitor,
and reporting and graphics modules. For
a cool, interactive demo, go to www.heroix.com/aspscript/WalkthroughReg.asp. |
|
|
The network is running a mixed environment of Microsoft
Windows NT 4.0, SCO Unix version 5, and Novell NetWare
4.2. Canella uses most of the protocols NT has to offer--WINS,
DHCP, TCP/IP, and IPX. He also has a WAN, VPN, email services,
and enterprise resource systems software installed. He
monitors most of these components with Heroix's RoboMon
7.5, an enterprise-wide package that detects issues in
the network, notifies the administrator of the situation,
and can even act proactively to prevent problems from
actually arising.
So, Why RoboMon?
What led Canella to RoboMon? The bottom line was system
crashes. About every other month something would lock
up and bring the system down. He wasn't getting good reports
with NT's Performance Monitor, and it was difficult to
pin down what was causing the lock-ups. Canella read trade
journals to see what other people were using and recommending.
He also looked into Tivoli, but after demoing RoboMon,
was sold on the product, seeing an improvement in performance
within the first week. Now, he's been using it a little
over a year and is still sold on it.
RoboMon has helped Canella avert the usual catastrophes:
users filling up the hard drive and impending crashes
caused by memory resource depletion. At HITT the fax server
is attached to mail servers, and in one situation, RoboMon
helped him prevent a crash; someone had been sending out
315-page faxes--just a bit taxing on resources.
Of course, there have been other perks, in addition to
feeling confident that he can keep the network up and
running. "Prior to using RoboMon, the IT department
spent most of its time monitoring the network and putting
out fires." Canella estimates that he easily saves
four to five hours a week with RoboMon.
Two
reviews in MCP Magazine have evaluated
RoboMon: |
"Ease Your Network Management
Pains" by Scott R. Burgess in the
January 1999 issue.
"Check the Pulse of Your SQL Server
7.0 Apps" by Mike Gunderloy in
the August 1999 issue
|
|
|
Out-of-the-Box Rules
RoboMon is a rules-based systems management software
designed to work out of the box. Canella says he didn't
do much tweaking because RoboMon comes with every possible
rule he could think of to need. The only thing he had
to do was set up the email and pager notifications. The
programmers at Heroix had already figured out what tweaks
are needed for each service or product, such as WINS,
SQL Server, Exchange Server, and so on, to take advantage
of system resources. Canella has, however, modified some
parameters, based on the information RoboMon has gathered,
and the potential problems it has uncovered: not enough
swap file space, not enough RAM on this machine, not enough
drive space, or the drives are too slow. As Canella says,
"That's why I'm so fond of the product--it's helped
me tweak my systems to the point that they're quite reliable."
|
Figure 1. Heroix's RoboMon allows
the systems administrator to manage the NT enterprise
by establishing rules for network processes; a large
number of rules are set by default out of the box. |
Expertise was built into the Rules Engine by RoboMon's
developers by watching how systems administrators actually
solve problems. A rule defines a condition to check for
and one or more actions that will arise from the rule,
such as page the sys admin if a disk becomes less than
20 percent free. RoboMon consults a variety of data sources,
including NT Event Logs, COM objects, SNMP traps and variables,
and databases. Rules monitor network services and resources,
as well as DHCP, WINS, and any proprietary applications
to perform these functions:
- Condition detection
- Event correlation
- Problem investigation
- Notification
- Corrective action
- Follow up
- Escalation
- Resolution
RoboMon's Rule Engine runs on all the machines where
the administrator wants to monitor, detect, and correct
problems locally. Because data is sent across the network
only when RoboMon performs a remote action or notify a
central monitoring location, network traffic is kept to
a minimum. Although RoboMon loads five or six services
on the server, Canella says there hasn't been a performance
degradation of his network.
Event Monitor
Event Monitor runs as a client/server application, but
because the RoboMon processes monitoring the servers run
as autonomous agents, there's no single point of failure.
The administrator can monitor and manage all sites from
a central location and consolidate enterprise-wide events
across NT, Unix, OpenVMS, and any SNMP agent. (The software
doesn't encompass monitoring of NetWare directly.) Enterprise
Manager lets you make a change at any level, on any process,
computer, or domain across the entire enterprise, which
makes RoboMon easy to scale. First, you view the rule
properties in Enterprise Manager; then, use Solutions
Manager to tailor rules quickly by customizing detection
thresholds, selections, and other settings--without writing
code.
Self-configuring sensors dynamically adjust to changes
in system configuration to prevent the need for on-going
maintenance. This means you can add software or devices,
and RoboMon automatically reconfigures your system. It
observes an application-for example, Exchange-determines
its typical utilization, and then automatically monitors
for deviations from normal.
Remote
RoboMon Help for NT |
Heroix RoboMon Emergency
Repair (ER), working with RoboMon NT,
lets you remotely repair locked or unresponsive
mission-critical servers running NT 4.0
or higher (on Intel and Alpha), Exchange,
IIS, SQL or other BackOffice components
without rebooting. RoboMon ER connects
to the Internet via TCP/IP or a serial
host.
In DOS mode, RoboMon ER gives you command-line
access to any NT system. This allows
you to reach and repair systems that
are inaccessible via graphical or Web-based
interfaces. The product provides diagnostic
and repair commands to free up memory,
shut down services, or fix other problems
causing lock ups.
The ER option includes a remote console
and an agent that resides on the server
operating as a real-time process. You
remotely access the NT server through
Telnet and after being authenticated
can view performance statistics, change
the characteristics of a service, view
processes and resources being used,
and also manipulate the NT Registry.
That way, if you do have to reboot,
at least it's a clean one.
--Sandy Burd
|
|
|
RoboMon in the Future
Canella says that if he were installing RoboMon today,
"I might do an enterprise-wide installation. Until
it was up and running for awhile and I had had a chance
to play with it, I wasn't fully aware of all its capabilities."
Canella had set RoboMon up on individual servers thinking
that the firewalls and schemes for the WAN would be too
complex to configure. Now, he doesn't think it would be
any problem to do so; but at this point, with everything
in place and working fine, he sees no reason to change
it. Soon, HITT Contracting will be moving to a new system
using Microsoft's Terminal Server and Citrix's MetaFrame,
which RoboMon supports, and which will simplify things.
Remote users will be able to log in over the Internet
and have full use of their desktops, even on low speed
connections.
Canella says when he upgrades, he'll definitely consolidate
events from many of his company's various OSs under RoboMon.
He expects to upgrade to Windows 2000 six months or so
after it's out--when the jury is in and he's convincedĀ¾and
will do a network-wide rollout to avoid any problems that
might occur in the Domain from having some boxes running
NT 4.0 and others Win2K. In fact, Canella expects the
Win2K version of RoboMon--expected out at the same time
as Win2K itself--to help him with the rollout by alerting
him to any services that are failing because of the installation.
Canella also expects to deploy J.D. Edwards' OneWorld,
an enterprise framework, in the same timeframe and eliminate
the SCO and NetWare platforms.
Canella says what he likes best about RoboMon is that
he can see the health of his mission-critical production
severs. "RoboMon shows me what's occurring on the
machines, so I can see if there are any issues I need
to address or if I can take the rest of the day off."