In-Depth

Attention to Details

These three high-end monitoring tools automate the management of networks with thousands of nodes.

Managing today’s networks is no easy job. Not only do administrators have to deal with a wide variety of platforms, systems and products, but those things break down—servers crash, applications go haywire, disks fill up and bottlenecks clog network traffic. Keeping things up and running involves constant vigilance, and that’s what network and server monitoring tools are designed to do. Without such monitoring tools, your business could be on shaky ground if a critical server should fail and go unnoticed for any length of time.

A good network monitoring tool performs three basic jobs: monitor how things are going, alert administrators when things go wrong and automatically correct problems whenever possible. A good way of remembering this is the mnemonic “MAC,” which stands for Monitoring, Alerting, and Correcting. Let’s consider each function in turn.

Monitoring, Alerting, Correcting
Monitoring tools should be able to proactively monitor the wide range of server platforms, applications, and devices in today’s heterogeneous networks. Support for different versions of Microsoft Windows, popular Linux distributions and common flavors of Unix is essential. If your network has other platforms like Novell NetWare or OpenVMS, then your choice of a monitoring tool may be determined by such support. In terms of what it watches on your network, a good monitoring tool should be able to do things like parse system and event logs, detect when services stop unexpectedly, and track consumption of critical system resources, including processor usage, available memory, free disk space and network bandwidth. It should also be able to monitor key aspects of common server applications and services such as Active Directory, DNS, IIS, Exchange and SQL Server on Windows platforms or Apache, Oracle, and Lotus Domino on Unix/Linux platforms. Support for SNMP-enabled network devices like routers and switches is also essential to gain a true picture of what’s happening on a network. In addition, your company may have special monitoring needs for enterprise applications like SAP R/3 or unique messaging platforms like RIM Blackberry, and you need to take this into account before committing to a monitoring platform.

Alerting can be done in a variety of ways, ranging from pop-up alert dialogs on administrator workstations to sending e-mail messages and beeping pagers. A monitoring tool should be able to log the information it collects in a database for further analysis. Administrators should be able to specify what types of collected information are logged and how regional logs can be consolidated into a centralized repository in large enterprise environments. Also included should be services for analyzing the vast amounts of information collected, including visualization utilities for generating graphs that can provide clues regarding trends for troubleshooting problems and tools for generating reports based on templates designed for everyone from technical support staff to managers. A desirable feature is the ability to generate reports automatically according to a predefined schedule and either send them by e-mail to the appropriate parties or publish them on the corporate intranet. Most monitoring tools include some kind of functionality along this line. Some can even test for compliance with service level agreements (SLAs), a feature you may want to look for if you lease services from a provider.

Correcting common network problems by killing runaway processes, restarting services and rebooting servers should be easy to configure and use. It should also be possible to set up an escalation path of corrective actions to perform when simple fixes don’t work and to execute scripts or batch files to perform advanced corrective actions automatically when situations require it.

By monitoring the status of network servers and devices, alerting administrators of problems that arise, and automatically taking steps to correct problems when they occur, network monitoring tools help ensure the reliability and availability of your business network. This is ultimately reflected in your company’s bottom line, so it’s a no-brainer that such tools, though sometimes expensive, are worth every penny they cost.

Another desirable feature of a monitoring product is that it be a robust solution that has small impact on the network itself you’re trying to watch. And while budget-conscious administrators may be able to assemble their own toolbox of free or almost-free monitoring tools and use them to keep their network up and running, such solutions become unmanageable for networks with hundreds or thousands of servers. So the ability to scale well and get up to speed quickly through the user interface are also important considerations when choosing monitoring software for your network.

If your company is already using an enterprise-level management product like Computer Associates Unicenter, HP OpenView, IBM Tivoli or Microsoft Operations Manager (MOM), it may also be desirable for the monitoring product you purchase to integrate easily into these framework solutions. So be sure to watch for that as well when you‘re considering which product to go with.

Product Information
The Argent Guardian 6.2
$15,000 for 10 monitored servers with an unlimited number of monitoring consoles.
Argent Software Inc.
www.argent.com

Heroix eQ Management Suite 2.0
Starts at $595 per server.
Heroix Corp.
www.heroix.com

AppManager Suite 5.0.1
Starts at $600 per server; additional application modules range from $600 to $1,500 per server.
NetIQ Corp.
www.netiq.com

The Argent Guardian 6.2
Installing The Argent Guardian was a breeze because, by default, it uses a proprietary database system. This database is designed to provide a quick out-of-the-box solution for smaller networks of up to 20 to 30 managed nodes. For larger networks the product also supports Microsoft SQL Server 7.0 or higher to store the monitoring information it collects. What’s really unique about Guardian is its agent-optional architecture, which allows you to monitor servers with or without installing agent software on them. In general you only need to install agents on special servers like firewalls or, if you’re monitoring servers remotely, over a WAN. In a LAN environment installing agents is unnecessary. This gets you up and running with a minimum of hassle.

Guardian’s GUI is a simple Explorer-like interface that’s easy to work with. (See Figure 1.) The Definition tab is where the meat of the product resides, which consists of four elements: rules, alerts, node lists and relators. The basic approach to using the product is simple: Rules define a specific test to perform for an application or service, such as checking if a domain controller is online, checking the System log for new Error events, checking for a processor bottleneck and so on. Alerts are actions to be taken when a rule is broken, such as sending an e-mail, paging someone or performing some custom command. Node lists are collections of up to 1,024 managed nodes to monitor, which can include Windows NT/2000 machines, Unix/ Linux hosts or any network device that supports SNMP. (Support for monitoring Windows Server 2003 machines will be in the next incremental release of the product, which should be available at the time you read this article.) Tying this together are relators, which associate rules, alerts, and node lists together with a schedule.

Argent Guardian
Figure 1. Argent Software’s The Argent Guardian is a powerful but easy-to-use solution for enterprise monitoring and alerting. (Click image to view larger version.)

Before configuring my first relator, I began by defining a new node list for my testbed network. Using Node Manager, I scanned for servers running on my network; Guardian had no trouble finding and displaying them. Specifying the servers I wanted to monitor became as easy as dragging and dropping them onto the node list. Then I selected a rule to test (excessive bad logons) and modified it so that it triggered after three failed logons. (Modifications are saved automatically.) Guardian comes with over a thousand built-in rules that monitor Active Directory and DNS, Event Logs, perfmon counters, services, SNMP get/trap, system down, custom commands and more. One indication of the support Argent offers is the fact that the company will develop custom rules and rule sets for licensed customers at no extra cost. Once I selected a rule, I then chose an alert to trigger (send console message) and modified the message sent when the alert triggers. Finally, I created my new relator and added the rule, node list and alert to it—unfortunately, you can’t use drag and drop to do this step. I experienced one glitch when I added my rule to my relator and then double-clicked on the rule to reconfigure it, which prompted me to save changes to the relator before the properties for the rule opened. I chose No and Guardian unexpectedly closed, so I had to open it again and finish adding a node list and alert to the relator.

Once the relator was configured, I specified an aggressive monitoring schedule for testing purposes and switched the relator from testing to production mode to begin monitoring. I tried a few failed logons to a workstation and the expected alert appeared. I tried adding additional rules for CPU usage and service failure, created an alert escalation plan to warn of service failure and attempt service restart, used service-level agreement (SLA) rules to capture uptime/downtime info, performed a service inventory on all my managed nodes and tested a number of other management tasks. Everything worked as expected. I also tried generating some built-in reports to display uptime, pending alerts, domain account activity and so on. Guardian’s reporting facility is basic but easy to use and does the job to communicate essential info to administrators concerning their network. On the whole the product is impressive—its easy learning curve gets you up and monitoring your servers in no time flat.

Guardian integrates with other Argent products including The Argent Predictor (used for trend analysis), The Argent Sentinel (used to check Web site response times and watch for site changes), Argent Console (a master console for displaying all alerts), Argent Exchange Monitor (for monitoring Exchange 5.5/2000 servers) and more. Guardian also includes pre-configured relators for monitoring Exchange, SQL Server and Oracle, and it easily integrates into enterprise framework solutions like Tivoli and HP OpenView.

NetIQ AppManager Suite 5.0.1
Installing AppManager was a bit of a challenge since the product requires SQL Server for its repository. Fortunately the documentation included with the product is exceptional. AppManager has a component architecture that consists of a repository, management server, operator console and agents running on managed nodes. To monitor a target server, you first install an agent on it and then create a job on the operator console specifying what parameters you want to monitor. The job is stored in the repository and is assigned to the agent by the management server, which acts as a go-between.

For testing purposes, I installed the repository, management server and operator console all on the same machine, a member server running SQL Server 2000 with Service Pack 3a and configured to use Windows authentication. A nice feature of AppManager is that Setup can be run in a special Pre-Installation Check mode to generate an HTML report on whether your system meets the various hardware and software requirements for successful installation. From my experience with the product I strongly recommend that you make use of this pre-installation feature and that you meet or exceed the CPU requirements, especially if your repository and management server are running on the same box. I also suggest you specify a fixed amount of memory for SQL Server in order to minimize resource contention between the repository and management server components. AppManager has three different operator consoles you can use: a standard Win32 application, an MMC snap-in and a Web console requiring a special Web management server component that uses IIS to be installed. I found the Win32 console the easiest to work with and focused on this for my testing.

Once I had my management box set up, the next step was to install agents on the machines I wanted to monitor. You can install agents two ways, locally or remotely. I then discovered the other servers running on my testbed network and ran the Install Agent knowledge script to install the agent along with selected manageable objects on the target machines. Knowledge scripts are at the heart of AppManager and consist of VBScript code for performing some action such as installing an agent, discovering a service or monitoring some aspect of a managed node. The Install Agent knowledge script is configured with an easy-to-use wizard that let me specify what services I wanted to monitor. (I chose Active Directory and IIS.) The wizard automatically discovers these services on the target machines. As these services were discovered, new tabs were added to the bottom of the console representing views of all my NT/2000 machines, Web servers, domain controllers and so on. (See Figure 2.)

NetIQ AppManager
Figure 2. NetIQ’s AppManager lets you easily create jobs for proactively monitoring servers across an enterprise. (Click image to view larger version.)

Once I had agents installed on my servers, I next tried creating and running various monitoring jobs against them. Creating a job is as simple as choosing a knowledge script from the built-in library and dragging it onto one or more machines in the console tree. Jobs can be configured to run with various schedules, threshold parameters for triggering events and corrective actions to be performed when an event is triggered. Knowledge scripts themselves can also be permanently configured or edited if desired. For example, I used the CheckDomainController response time script for Active Directory to create a job to check the response time threshold for logging on to the domain controller. Then I ran a process on the domain controller to drive CPU utilization to 100 percent and slow response to logons. The job triggered a warning event in the console, identifying the server and response test by a flashing yellow symbol. (AppManager is great for getting your attention and directing it to problems it discovers.) I used the NumberOfUsers script to configure a network alert when a specified number of domain users is exceeded, then created more users in Active Directory until the alert appeared. I used the ServiceDown script to create a job to restart IIS if the WWW service was down, then manually stopped the service. AppManager detected the problem and automatically restarted the service. I also collected perfmon counters for CPU and memory usage and used AppManager’s excellent graphing feature to create three-dimensional charts of performance activity that I could rotate and explore in real time. One thing that tripped me up for a moment was that the scripts for CPU and memory usage were found only on the NT tab and not the WIN2000 or WIN2003 tabs. The trick is that these three tabs are cumulative—NT scripts apply to later OS versions as well.

AppManager supports a wide range of platforms (Windows, Unix/Linux and most popular enterprise applications) and integrates into popular management frameworks (Unicenter, Tivoli, OpenView and Microsoft Operations Manager). Other installed tools include a Distributed Events Console, which gives you a quick birds-eye view of pending events for all your managed nodes, and Security Manager, which lets you define roles to distribute the administrative load of using the product among your IT staff. The learning curve for the product is steep but not insurmountable given the excellent documentation. Plus, there’s an active user community on NetIQ’s Web site where new scripts are developed and shared and user questions are answered.

Heroix eQ Management Suite 2.0
To use the newly enhanced Web Management Console of version 2.0 of Heroix eQ, I had to install the product on SQL Server (it requires version 7.0 SP4 or higher). It was worth the wait—Heroix eQ has one of the best Web-based interfaces I’ve ever seen (see Figure 3). It’s easy to work with and well laid-out, and context-sensitive help is available for every task.

Once I had the product up and running, the next step was to deploy agents on the servers I wanted to monitor. Agents can be installed either locally or remotely, and I chose a remote install. Unfortunately remote installation of agents still has to be done using the older Win32 console, since this is one of the few areas of functionality not yet supported by the product’s newer Web interface. A small glitch in the process was that remote installation of eQ agents requires Windows Installer v1.11 or higher on the target machines, so I had to install SP3 on my Win2K servers to upgrade Installer 1.10 to the latest version before the remote install would work, something that eQ’s Readme file seemed to omit. Note also that if you install agents locally from the product CD instead of remotely from your management server, you have to add the machines manually to your list of managed servers afterwards by specifying their names or IP addresses.

Heroix eQ
Figure 3. Version 2.0 of Heroix eQ boasts a well-designed Web interface that makes managing servers a snap. (Click image to view larger version.)

Once I had agents installed and routing set up, the next task was to start monitoring something. Every product has its own jargon, and eQ is no different. The fundamental building block of Heroix eQ is the rule, which specifies a particular aspect of an application to monitor, a variable to initialize, an action to perform and so on. Rules are used to define sensors, which consist of one or more rules grouped together for simplified management. Sensors are then categorized into solutions, broad areas of management functionality such as Active Directory, DHCP, IIS or Windows 2000 Server for basic Windows NT/2000 monitoring; and Veritas Backup Exec, Cisco, Exchange 2000 or Unix/Apache for multiplatform and application monitoring. The basic Windows Server solution, for example, includes general sensors like CPU Load, DNS Performance, Paging Load, Printer State and so on. Since eQ automatically detects what components and applications are running on a server once you install the agent, all I needed to do in order to monitor Active Directory was select my domain controller, start the Active Directory solution on it and wait while eQ began collecting management data and checking it against predefined thresholds.

To generate some events, I reconfigured some sensors with tighter thresholds and a more aggressive schedule, and soon eQ was indicating problems that needed my attention. The interface catches your attention when something goes wrong. Server icons are color coded, and severity levels of problems have smiley icons with different colors and facial expressions associated with them—cute, but actually quite effective. You can also configure eQ to use SMTP mail to alert you when problems occur, set up SNMP traps for managed devices, and execute scripts or commands to take corrective action when required. You can also configure solutions on one machine and then copy it to other similar machines to save time.

One really nice feature of the Web interface is that you can set filter conditions based on things like type of event or its severity and then save the filter to your list of favorites for Internet Explorer. Then you can open multiple browser windows on your administrator console and monitor different event conditions in different browser windows, with the windows refreshing every 30 seconds or so depending on how you have them configured.

One small thing Heroix could improve on: The filter conditions you’ve selected could be displayed more clearly. For example, if I filter for Active Directory events of severity 3 on selected domain controllers, this detailed filter information could be displayed in the heading and title of the page so you know what you’re actually looking at and what the filter represents. It took me a minute to realize also that if I wanted to display all events with severity less than or equal to 3, I had to select 1, 2 and 3 in the listbox.

The product uses Crystal Reports for its graphing and reporting functionality. This works great except that in the current release you have to switch back to the older Win32 console to generate a report or graph before you can display it using the Web interface. (You can set it up to run batch reports in the background as well). The next version (2.1), however, will add report and graph generation to the Web interface, and this should be out by the time you read this article. If you’ve been using Heroix’s older monitoring product, RoboMon, you’ll be glad to know that eQ has been built to be backwardly compatible with RoboMon, though some configuration is required to make them work together.

But the best part of the product is the Solution Studio component included as part of the Heroix eQ Management Suite. The Solution Studio allows you to develop custom solutions (collections of sensors) for monitoring virtually any aspect of your systems and applications you need to monitor. And you can do this using a wizard-based interface that precludes the need to write code. I played around with this neat tool and created a couple of solutions for monitoring things like the IIS cache and ASP sessions. I found it easy to create a custom service, add sensors, define rules, deploy the solution to managed servers across my network and start monitoring the solution.

The Network Monitoring Tools Tested

All three of the products I tested were designed with medium to large networks in mind. These networks might have hundreds or thousands of managed nodes together with a mix of platforms and products from different vendors. I evaluated The Argent Guardian from Argent Software Inc., AppManager Suite from NetIQ Corp., and Heroix eQ Management Suite from Heroix Corp. All are rock-solid and reliable in their operation, have user interfaces that are for the most part intuitive and easy to learn to use, have extensible architectures that scale well, and come with highly configurable scheduling options, consolidated event monitoring, extensive reporting features, a variety of alert and notification methods, and customizable mechanisms for taking various corrective actions when things go wrong. They’re also priced well for the functionality they provide.

Although these products all support a wide range of operating system platforms and server applications, for the purposes of testing I focused exclusively on the Windows 2000 platform and installed and tried the products in our lab on an isolated testbed consisting of a domain-based network with a number of servers running in different roles.

—Mitch Tulloch

Choosing a Product
Like any other field in IT these days, network monitoring is highly competitive and there are a lot of good products out there—and some turkeys as well. But there were no turkeys among the products I tested. I was impressed with all three offerings and also with the level of support provided by each vendor. Which—if any—of these tools you choose for your network and server monitoring needs, however, depends largely on the nature and extent of your needs, your operating budget and your level of experience with network management solutions. All of the programs I evaluated are excellent.

Hats off to Argent for a great product that gets you up and running a managed network in no time flat. Kudos to NetIQ for a powerful platform that can do almost anything but whose depth I barely scratched due to its fairly steep learning curve. And bravo to Heroix for its snazzy new Web interface that lets you monitor your servers using a simple Web browser, a big improvement from the somewhat confusing older Win32 console.

Featured