Your IT Operations Guide
Running behind? Too much to do? Worried about the future? Take a breather and consider how to do your job better. This report shares 10 best practices that will put you and your IT staff ahead of the next fire.
- By Anil Desai
- December 01, 2001
It’s time to admit it. No matter how much pride you take in your job,
there’s always room for improvement. As IT professionals, we tend to focus
on technology when it comes to getting the job done. In fact, that’s what
most of us would say we’re known for. However, there’s much more to IT
than just the management of hardware, software and network devi.
I’ve worked for many different organizations in my career (sometimes
as a consultant and sometimes as an employee). I’ve had the benefit of
seeing many different IT groups in action. Some operated like well-oiled
machines. Everyone knew what was going on and worked toward the same goals.
Others worked as if gears were out of alignment. Tasks weren’t synchronized
and one cog had no idea what the other was doing when it came to work
like deploying a new server or moving a Web site. Simple issues would
become critical because nobody dealt with them in a timely manner. Everything
was an emergency, and IT spent most of its time trying to stay “afloat.”
Don’t get me wrong: In general, the IT staff was dedicated and worked
hard. However, a lack of structure in these companies was the root cause
of many IT-related problems.
In this article I share 10 best practices that will help you improve
efficiency in your environment. If you’ve never developed a guide to document
the processes and procedures of your operations, consider this a starting
place and a source of ideas. You can tweak and tune these tips based on
your experience and your environment. Whether you work as a member of
a team or the sole person in your IT department, I think you’ll find the
1. Talk More:
Communicate with and Educate Your Users
I once met an IT staffer who actually said, if a user’s machine is infected
by a virus, “That’s their problem.” Shortsighted, to say the least. The
fundamental purpose of an IT organization should be to help the business
meet its goals (whatever those may be). Therefore, it’s of paramount importance
that IT staff communicate with all areas of the organization. All too
often, it seems like IT departments work in a vacuum, handling requests
without seeing the big picture. Gaining some valuable insight about a
marketing program over lunch or when passing someone in the hallway might
be very useful when you’re planning for server capacity in the future.
Similarly, training users can have a major pay-off. Wouldn’t it be great
if you could get everyone in your company to help you do your job? Think
of all the time you’d save if people cleaned out their own home directories
periodically and did their part to ensure their files were being backed
up. If you take the time to teach a user to perform common tasks without
your assistance, it can be a great investment in the end.
Common operating systems and applications usually have much more capability
than most users take advantage of. Show users the benefits of sharing
documents (using Microsoft Word’s Revision and “Track Changes” features
or the use of Public Folders on your Exchange Server) and help them understand
the benefits of using the company’s intranet to make information more
easily available. Granted, there will be many users who just won’t get
it, but others will.
Although it can be painful, it’s always a good idea to get feedback from
users. You might know that you’re doing a great job, but what do your
“customers” think? Soliciting feedback can be difficult, especially when
you have to read negative comments. However, users will generally feel
better that you care (“… but thanks for asking!”), and you might gain
some valuable insight into what’s really important. Imagine, for example,
that your users really aren’t concerned about disk quotas (something you’ve
spent a lot of time trying to administer); but it really bothers them
when a virus scanner runs every time they log in. In this case, there
may be a quick and easy way to improve performance and alleviate this
point of pain.
We’ve all come to expect information to be readily available. How many
times in the last two years have you picked up the phone to call a vendor
to create and mail to you a floppy disk containing some drivers for a
network card? This was a common practice just a few years ago. Now we
expect all of our vendors to have easy-to-use Web sites that let us serve
ourselves. The result is a much more efficient method for getting what
we need (plus, it allows technical support staff to focus on real issues).
The same should be true for the IT department’s clients. If someone complains
that a machine isn’t working properly, make it your problem—and be sure
to follow-through to make sure it’s resolved. Keep users informed through
the use of an intranet site and, whenever it’s absolutely required, e-mail.
Make sure the people you support know where to get information and that
it’s kept up to date. When network problems or other issues arise, they
should feel confident that they can get the latest information from the
Best practice: Take the time to understand the needs of your users
and to provide them information about the IT department. Put an emphasis
on self-service for common tasks and seek feedback. In many cases, a little
information is all that users need to improve productivity and to cooperate
2. The Grind:
Document Regular Tasks
Have you ever gotten that sinking feeling when a manager asks if you’ve
verified backups or checked the configuration on a critical server? If
so, you know that it’s difficult to remember the day-in, day-out tasks.
You need to develop and maintain checklists for common operations. All
too often it’s easy to forget some “little detail” that’s going to lead
to a loss of productivity or repeated calls to the help desk.
Here, I present a few suggestions for the types of tasks you should be
responsible for performing regularly. Remember, though, that every environment
is different and that you should design your own lists based on your priorities.
Regular tasks are the types of things for which you’re always responsible.
In many cases, the exact nature of these tasks is difficult to predict.
For example, if you’re responsible for user support, it’s anyone’s guess
as to what wacky problems you’ll come across on any given day. If you
could predict server failures, you could probably earn a lot more money
as a professional IT psychic, yet any IT group has many jobs that are
important to perform on a regular basis.
I’m willing to bet that most IT professionals reading
this article are overworked. That is, no matter how
much effort you exert, you’ll find that there are still
more tasks to perform. And everything is prioritized
as “critical,” whether it’s a VP’s mouse that “sometimes
skips” or an internal server that has intermittent slowdowns.
In some ways, that’s good; it keeps you on your toes
and focused on the job. Assuming that all tasks are
important, though, how should you decide which ones
to do first? A good way to look at them is to determine
the value of the tasks and compare them with the effort
required to complete them. Figure A provides an example.
|Figure A. How to graph value
vs. effort for the IT tasks you face in any given
time period. (Click image to view larger version.)
All too often, we tend to work on easy tasks first,
things like defragging hard drives and installing software
(though they’re tedious and not the most rewarding).
Or we choose to react to what seems like the immediate
problem—an upset manager, for example. Given that you
have limited resources to get the job done, you must
figure out what’s important.
A good example is performance optimization (Figure
A). Although the job will probably never be complete
(performance could always be “better”), you may be able
to make more efficient use of your investments with
just a few mouse-clicks. This would have a lot of value,
but would require little effort (that’s good and should
have a high priority). However, you’ll reach a point
where you would have to exert significant effort to
receive a marginal improvement in performance. That
would be the opposite: low value, high effort (that’s
bad and should be a lower priority). Overall, use this
(or some other) methodology to determine what you should
be working on and compare that with what you are working
Some are things you need to do every day. For example, you should often
review the status of all open help desk requests and other issues. Other
tasks can happen less frequently. It might be important to verify that
the weekly full backups have been performed on schedule and without unexpected
errors. Monthly tasks usually focus on higher-level analysis and management
of your network environment. These are crucial for ensuring the overall
efficiency of the hardware, software and networks you support. Table 1
provides some examples of tasks, along with a suggested frequency. Use
it as a baseline to create your own list of regular jobs.
| Review the status of help desk requests.
||Review reported issues and update all
"open" issues that haven’t been modified within the last three
||Help desk software/tools or e-mail
|Analyze disk space usage reports for
||Determine trends in disk space usage
and predict when new capacity might be needed.
||Performance Monitor (% Free Space Counters);
Excel spreadsheet, containing graphs of disk space use over
|Review all scheduled jobs on SQL Server
||Verify that all jobs are running properly.
||SQL Server Enterprise Manager; custom
SQL scripts; Event Viewer.
|Review audit logs.
||Look for suspicious patterns
of activity (such as failed logon attempts or access to sensitive
||Manual inspection using
Event Viewer filters or use of third-party tools.
(for eight servers)
|Review user accounts.
||Ensure that all accounts are configured
as required; remove unneeded accounts.
||Manual inspection, using Active Directory
Users and Computers tool.
|Review anticipated IT-related changes.
||Plan for expected moves, adds and changes
for servers and user workstations.
|Update project status.
||Update status information for all open
||Project tracking tools, spreadsheets
||Review purchase orders and compare
with software inventory.
||Excel, SMS or third-party tools; Windows
|Review and update configuration documentation.
||Ensure that all configuration documentation
is up to date.
||Word, SMS or third-party tools.
|Verify backups for an entire server.
||Verify full server restore from tape;
record time taken and any problems encountered.
|Verify disaster recovery procedures.
|| Attempt to rebuild critical server,
assuming that all hardware and data is lost; help test disaster
|| IT intranet for disaster recovery
process information; Word for documentation of results.
|Table 1. A list of regular IT tasks
can remind you to do the essential, regular chores every IT staff
Depending on the size of the environment and the number of IT staffers
you have, you’ll probably want to delegate responsibilities to specific
people. Be sure that everyone knows what they’re responsible for and when
the tasks need to be performed. Also, the amount of time that’s required
for these tasks will vary based on the size of your environment. Note,
however, that most don’t take very long—it’s just important to be sure
you set aside the time to do them. A few minutes of dedicated effort here
and there can really help ensure that your environment is working optimally.
Finally, keeping such a checklist can help you determine where your time
is going and provide hints as to what might need to change.
Best practice: Develop a checklist for the tasks that you know you
need to perform regularly. Use this information to find areas for improvement
and to make sure you don’t ignore important tasks.
Day Activities for IT
A downturn in the economy can bring with it some sweeping
changes in an organization. Those $4,000 routers that
you may have been able to purchase on your own a few
months ago may now require the approval of a half-dozen
managers within your organization. Many IT groups are
settling into “maintenance mode,” as their companies
aren’t hiring new staff or buying hardware as quickly
as before. And, although many people debate the extent
of the downturn and its real effects, the fact is that
many companies have scaled back on spending.
So what can you do when management has told you you
can’t purchase any new hardware, software or network
devices—and, by the way, your responsibilities and goals
haven’t changed? Well, it’s a different mindset; but
one potentially beneficial approach is to use this time
to do all of the things you didn’t have the opportunity
to do before. When things are slow, there’s no excuse
for not working on good maintenance practices. Here
are some ideas:
- Rethink your strategies. Decisions
are often made with a focus on speed of implementation,
based on the best information available at the time.
For example, someone might ask you to set up a new
installation of SQL Server 2000 for use by the marketing
department. Although a new server might get the job
done, it’s not always the most efficient method. Usually,
a collection of fewer servers is easier to manage.
Remember the “Customers” database that was expected
to hit a gigabyte but is still only 100MB? Move the
database to another SQL Server machine. You’ll save
the cost of the new server and it will make the management
of backups, performance and other common tasks much
easier. Similarly, look for areas in which you can
do more with fewer resources—they’re out there, but
it just takes some time, effort and skill to find
- Make an IT wish list. Take the time
to determine what you’d really like to do when things
turn around. It’s great to be able to back up your
ideas with facts. For example, you could state, “The
average development user requires 500MB of disk space
in his or her home directory.” Use that figure to
make decisions about how much disk space you might
need in the future. If you need a lot, it might be
cost-effective to invest in a disk array or a network-attached
- Live for today, plan for tomorrow.
Although a downturn in the economy can really reduce
the pace of business, it’s important to realize that
there will be a turnaround. Your job is to ensure
that your company is in the best possible situation
when that occurs. We just don’t know how quickly that
will happen or when it will begin. Use this time to
make plans for the future, including estimates on
how you’ll deal with rapid growth (if that’s expected).
Often, planning for the future can help you make better
decisions when building and managing your current
With any luck, the amount of time you spend in planning
will pay off. This may sound overly optimistic, but
think of an economic downturn as a different kind of
opportunity instead of just as a setback!
3. Get in the Fast Lane:
Monitor and Optimize Performance
A fundamental task for IT staffers is to maximize their organization’s
investment in hardware, software and network devices. If just anyone could
deploy a Windows 2000 Active Directory domain controller in its optimal
configuration, it’s possible that no one would need you at all! A great
way to maximize investments in client- and server-side hardware is to
implement routine performance monitoring and optimization cycles. The
process should include the following:
- Establish a baseline.
- Identify a bottleneck.
- Make changes in an attempt to improve performance.
- Remeasure performance and compare with the baseline.
- Repeat, as desired.
Although this might seem like a lot of work in theory, in practice it
can take as little as a few minutes. Contrary to what some vendors would
want you to believe, you don’t have to invest in hundreds of thousands
of dollars of software just to manage a few servers. Figure 1 provides
a report generated by Win2K’s performance tool. The graph (which represents
information collected over several hours) can provide many valuable insights.
For example, the chart shows the amount of memory SQL Server was using
on the server throughout the day as well as information about the number
of users connected to the server.
|Figure 1. A view of logged performance data on
a Windows 2000 Server. (Click image to view larger version.)
Always demand details. When you’re troubleshooting problems, you might
hear comments related to the reliability of a machine or a systems administrator
might claim that a machine is “overloaded,” and that’s the reason for
the slow performance. Don’t accept such vague answers. If I told you my
car was “broken,” you’d probably want details. Does it start? When did
the problem begin? Why don’t other similar cars have this problem? Demand
the same from IT staff. What does “overloaded” mean? Are we talking about
excessive CPU utilization? If so, during peak periods, we should see sustained
CPU spikes. A simple Performance Monitor measurement would prove or disprove
this theory. You might find, for example, that CPU usage is low when the
server is slow. In that case, you’ll need to look for other bottlenecks,
such as issues related to disk I/O, memory I/O (paging), network utilization
and so on. Based on these results, you’ll be able to make much better
decisions on upgrades or the placement of critical applications. You might
find, for instance, that the engineering department really doesn’t need
a brand new server to run a defect-tracking application.
Best practice: Take some time to get familiar with performance logging
and monitoring tools, as well as performance methodology. In the end,
this will help you maximize your IT investments and better understand
your server bottlenecks.
4. Plan for the Worst:
Develop and Test Backup and Recovery Procedures
If you were asked to list the top 10 IT tasks, backup and recovery would
probably be two of the first things you’d mention. They’re also probably
close to the top of the list of the most annoying, tedious and mundane
IT tasks. Nevertheless, backup and its not-so-distant cousin recovery
are truly important.
The foundation of a good data protection plan is based on determining
your recovery requirements. Find out what data needs to be stored, why
it must be backed up and how often it should be backed up. A simple table
like the one listed in Table 2 can help.
down time (recovery window)
|User home directories
||Two business hours
||Survive server disk failure.
||Data is stored on multiple file servers.
|Marketing Shared Data
||One business day
||Survive disk, network or server failure.
||Large volume of data is changed frequently.
|Engineering defect-tracking system
||One business day
||Survive disk failure.
| Sales Database Application
||One business hour
|| Survive disk, network
or server failure.
||Application runs on SQL
Server 2000 database.
|Table 2. Setting data protection
levels will help your group design an optimal backup and recovery
When you start with a recovery plan that includes well-defined requirements,
you’ll probably be able to come up with some creative ways to back up
your data. For example, if Server 1 must be backed up hourly, but only
the latest backups must be retained, you could simply copy the differences
in the data to another network share somewhere. You could use this share
as a backup device, instead of bogging down your tape machines. Furthermore,
if the server must be able to survive a disk failure, you could simply
implement RAID technology (such as disk mirroring or disk striping with
parity) on the disk systems. Once the plan has been defined, make sure
you get sign-off from the appropriate people. Everyone should be involved
in this process so there are no surprises. For example, your vice president
of sales might not think that losing two hours of data is reasonable until
you explain the potential costs of better data protection.
Many backup plans tend to be ad hoc. That is, when a new server goes
up on the network, systems administrators just add the entire machine
to the backup schedule. While that may get the job done, it also backs
up a lot of information that you may not need (like operating system directories
if you don’t plan to restore the entire OS from tape). I’m willing to
bet you’ll find many of your servers overprotected when it comes to backup
and recovery. If you take some time to back up only what’s important,
you can make much more efficient use of tape, disk, network and other
Now comes the harder part: Don’t forget to practice recovery operations.
You’ll generally learn a lot by going through a simulated failure. First,
you’ll know exactly what you need to do when an emergency arises—you’ll
be thinking a lot more clearly during this “rehearsal” than when your
CTO is breathing down your back. Next, you’ll know exactly how long it
takes to recover the systems. If you know that the process will take four
hours, for example, this should really help others in the organization
react accordingly (by using alternate systems, canceling sales calls,
or whatever needs to be done).
Best practice: Create and define recovery requirements for all of
the data that the IT team backs up. Then, based on these requirements,
review your backup strategy and implementation. Once that’s in place,
be sure to go through regular full dress rehearsals to make sure you can
quickly and reliably restore data. Remember, the goal is to meet your
business needs for data protection.
5. Keep Up with Technology:
Work on Training (and Cross-Training)
Modern hardware and software can be complicated tools, and rarely do we
get a chance to understand and implement all of the features. This is
especially true when it comes to feature-packed operating systems like
Win2K. Fortunately, most techies enjoy the challenges and benefits associated
with learning something new. That’s where training (and cross-training)
can be a win-win situation. No, I’m not talking about excruciating four-hour
workouts, here (although some IT staffers might benefit from the exercise).
IT-related training can take many forms. Traditionally, companies would
send employees to instructor-led classes (or larger organizations would
have instructors visit their facilities). This can be somewhat expensive
and disruptive to business tasks. (Who among us wouldn’t be missed if
we took an entire week off?)
Fortunately, we have other options. Being an MCP Magazine reader, there’s
no doubt you’ve taken advantage of many of the print and electronic technical
resources that are out there. Books and Web sites can be great resources
for learning; sometimes all it takes is a good article to help you add
a new technique to your IT bag of tricks.
Cross-training also presents many potential benefits. Staff members who
are experts in an area transfer their knowledge to co-workers. Often all
it takes is free pizza to get parts of the IT group together over lunch
to learn some new technical topic. Even if it doesn’t pertain directly
to their jobs, this can greatly help IT staff stay motivated and keep
the gears in their heads turning. Not only is it inexpensive, but it’ll
help staff develop their “softer skills” (like those related to presentation
and communication of technical information). Cross-training can really
help foster a sense of teamwork in an environment (for example, systems
administrators might better learn what black arts the SQL Server DBAs
Perhaps the most important thing about training is to be sure to add
it to your list of things to do. If you can’t afford to take days off
regularly to attend training classes, be sure that you set aside at least
a few hours a week to read through articles or portions of books that
you think might be valuable. Also be sure to keep track of new technologies
that you want to learn.
Best practice: Set aside time for training and cross-training. Whether
you manage a staff of 50 or just yourself, be sure that you’re constantly
learning new and useful technologies. Remember, in the IT industry, if
you’re not moving ahead, you’re falling behind!
6. Knowledge is Power:
Understand your Environment
Many of us manage IT environments reactively. That is, IT waits until
problems are reported by users before they take care of them. For example,
you might depend on users to report an inability to print documents before
you check the print server to ensure there’s sufficient disk space for
the spooler to operate. Then IT staffers work to resolve the problem as
quickly as possible, often working under time pressures. A much better
scenario would be one in which you anticipated the problem before it happened.
In many cases, an ounce of IT-related prevention can save many pounds
of IT-related cures. For example, if you determine that you’ll need new
disk space for one of your file servers within a month, you’ll have time
to find a good deal on the necessary hardware, install the drive and move
directories (if necessary). You might also choose to implement disk quotas
or to have a few users clear out some disk space. On the other hand, if
you wait until your users are complaining that they’re getting “out of
disk space” messages, you won’t have as much time to solve the problem.
This will undoubtedly lead to unhappy users and a tough job for IT to
There are other benefits to tracking trend information. In general, the
more you know about your environment, the better. Suppose some salesperson
introduces you to the miracles of Storage Area Networks (SANs). Your job
is to determine if a SAN would save your organization money over time.
Based on trend information you’ve collected (and on some educated extrapolations),
you could determine how much disk space you’ll need in the future. Then
you could figure out whether or not it’s worth the time and expense to
implement a shared-storage solution. Figure 2 shows a simple Excel spreadsheet
that includes disk storage information for a number of servers. You can
easily collect the information needed to identify trends through the Computer
Management tool in Win2K.
|Figure 2. You can track disk usage over time
using a simple Excel spreadsheet and Windows 2000’s Computer Management
tool. (Click image to view larger version.)
Best practice: Take the time to understand and track various aspects
of your environment. By monitoring disk space usage, network utilization
and specific applications, you might be able to address issues before
users notice them.
7. The Only Constant is Change:
Implement a Change Control Log
I’ll bet that you’ve talked to a user before whose machine “suddenly stopped
working.” When asked what was changed, you’ve gotten a simple, “I haven’t
done anything.” Later, you drill down into the details and find that a
dozen or so high-end games have filled up the hard disk and three instant
messaging clients are bogging down the machine. Wouldn’t it have been
much easier if you had known this up front?
The same applies for server management. You should keep track of what
changes are implemented on servers. For example, a change to the IP address
of a Web server (with a corresponding DNS change) might not seem like
a big deal to a network administrator. But, if a poorly designed application
depended on a hard-coded IP address, it might suddenly break “for no reason.”
Tracking down the change could take hours, especially in larger environments.
If, however, you had a single place to look for this information, you
could quickly determine what has changed recently. A simple Change Control
log might look like the one shown in Table 3.
||Added a new virtual directory called
||Added to support needs of Marketing
||Added 256MB RAM (now has 512MB total).
||Added based on performance issues.
||Restarted WWW service.
||GUI showed service as "Started," but
machine was not responding to requests.
|| 3:00 p.m.
|| Deployed new COM objects for Web site.
|| Installed per instructions from Engineering
|Table 3. A sample change control
With this type of information, you can find trends. For example, does
Web03 seem to have problems during certain times of day? Did the issues
crop up after other changes were made? Having this data can save hours
when troubleshooting new problems. Time after time, I’ve found this type
of information to be invaluable when working on many types of server problem.
If you’re dealing with a larger environment (one that supports hundreds
of servers), you might want to look at asset management or help desk solutions
that will help you keep track of changes and machine configurations. Many
products are available for tracking this type of information and for making
the results easily accessible (usually through a Web browser). One word
of advice, though: Spend most of your time developing content, not the
presentation. An IT site just has to provide useful information—people
won’t care about all the animated GIFs you’re able to place on the site
if they can’t find what they want.
Best practice: Start documenting the configuration of the servers
that you support and implement a configuration-management policy. Be sure
that all documentation is kept up to date and that IT members have easy
access to this information.
8. Build an IT Robot:
Automate the Boring Stuff
When you have breathing room in your environment, you should determine
which tasks might be good candidates for automation. Pick simple, repetitive
tasks that require time but no manual judgment as a priority. For example,
if you routinely move SQL Server backups from one machine to another,
you might want to implement the use of automated file copy scripts. Through
the use of the standard Xcopy command (or the much more powerful Win2K
Server Resource Kit RoboCopy utility), coupled with the Windows Task Scheduler,
you can make sure that the copies run automatically. Then you can simply
verify that the file copy operations have been performed. If that’s too
much trouble (OK, maybe now you’re getting spoiled), download a simple
utility that will send you an e-mail message when the job finishes.
Another example is the common task of restoring files. Perhaps your backup
utility provides you with a method to script common actions. Be sure to
look into this feature. Or if you find yourself frequently setting and
resetting permissions on directories, write some batch files that do the
job for you. Win2K includes scheduling capabilities from the command line
(using the “at” command) or via the more user-friendly “Scheduled Tasks”
Control Panel item.
Best practice: Identify some common, repetitive tasks that would be
good candidates for automation and find a way to ease the burden through
the use of scheduled scripts and batch files.
9. Stay Legit:
Get Current on Licensing
Many organizations have started seeing notices that range from subtle
reminders to outright threats regarding the licensing of the software
they use. Although software audits are still a fairly rare occurrence,
it’s important to make sure all of your machines are compliant with licensing
agreements. Because this is the responsibility of IT, that includes checking
users’ machines to make sure they haven’t installed any unapproved or
unlicensed software packages.
There are many ways to go about software auditing. Smaller organizations
could implement some simple batch commands in a logon script for writing
the contents of the start menu (or Program Files directory) to a text
file on a server. You could then populate an Excel spreadsheet with the
necessary information. Larger organizations should consider implementing
third-party tools, such as Microsoft’s Systems Management Server (SMS).
Such tools will be able to help inventory hardware and software stored
on machines regularly and can store the results in a relational database
system for better reporting.
Many people fear software audits because they’re worried about what they’ll
find. No one wants to go to the CFO and ask for $20,000 for software that’s
already deployed. However, there are potential benefits to auditing software
usage. That is, you might actually experience cost savings. Suppose you
find that the marketing department has individually purchased many different
copies of a popular image-editing application. You might be able to combine
all future purchases and get a volume discount from a preferred vendor.
Also, once you have a better handle on what software is out there, you
can set up an intranet page that will help users easily find updates,
patches and other useful information.
Best practice: Get serious about software licensing. In addition to
being able to sleep better at night knowing that you’ve done the right
thing, you might find some hidden cost savings and you’ll be able to better
support your users.
10. Put It in Writing:
Document Your Environment
If there exists a single task that most IT staff avoid like the plague,
it must be documentation. Granted, it can be difficult to sit in front
of Microsoft Word, typing out things that you think everyone does or should
know, anyway (and, I’m not just saying that because I’m working on this
article right now). However, having accurate, timely configuration information
can be helpful, especially in larger environments.
Technical documentation has two important features: scope and audience.
You need to determine what you want to document and how detailed the documentation
should be. In general, you want your comments to be “strategic,” instead
of “tactical.” Strategic documentation tends to be of a higher level.
For example, you might state that employee home directories are stored
on the “Users” share on Server12. If someone needs more detail (like information
about exactly which users have home directories or the permissions settings
on the directories), he or she should be able to go to the server to find
that information. Next, you should determine the intended audience: Are
you writing a SQL Server configuration manual for systems administrators
who are unfamiliar with relational databases or are you writing the document
for DBAs? Also, wherever possible, avoid repetition by linking to other
documents, articles or Web sites you use for background information.
Documenting your environment also involves developing processes. It’s
often useful to have well-defined procedures for handling issues related
to help desk escalations or reacting to critical server problems. A simple
example is shown in Figure 3. Here, a basic flowchart documents the steps
that should be performed before an issue is escalated. If an issue does,
indeed, need to be bumped up, it outlines the types of information that
should be provided to the next level of technician.
|Figure 3. A sample help desk issue resolution
flowchart. (Click image to view larger version.)
OK, so I’ve done nothing to convince you that writing and maintaining
documentation will be fun. But, wait, there’s more! Keep in mind that
most documents will never be “finished.” Factor in time to maintain your
documentation. If your configuration information refers to how your servers
were configured seven months ago, no one’s going to use it. Therefore,
keep the details in the documentation at a high level and update the documentation
whenever important changes occur.
Best practice: Create high-level documentation for the configuration
of your network environment. This information can be incredibly helpful
when others need information and you’re not available. Also, get in the
habit of updating documentation, whenever necessary.
Making the Best of Your Environment
If you felt overworked before you started reading this article, I’m afraid
I probably haven’t done much to convince you otherwise. Remember: The
point of all of the advice I’ve presented is not to make your job harder
or to add more work. In fact, it’s quite the opposite—to ensure that you’re
working as efficiently as possible. We covered many different types of
tasks that are required in IT environments. If you haven’t yet implemented
all of these ideas, don’t worry—few people have (myself, included). If
you’re routinely working 14 hours per day and still barely find time to
do the necessities, it’s clear that something needs to change. An easy
option might be to hire more staff (assuming your budget allows it), but
a more realistic one might be to take a long, hard look at what you’re
actually doing with your time. You might find that you’re spending 80
percent of your time doing the 20 percent of tasks that could be postponed
or overlooked. Worse, the problems you’re not looking at might be the
cause of all of the rest.
Make no mistake about it, for most environments, implementing the operations
practices I’ve covered will take time, resources and effort. However,
the goal—to improve overall IT operations—should be worth the investment
in the long run. We all want to build a better, more manageable environment.
Now is as good a time as any to get started.