Exchange 2007: Always On
Most organizations need their data available every second of every day. Unfortunately,
computers, networks and storage devices will all fail eventually -- no matter
how much we pay for them or how closely we monitor them.
Disaster recovery solutions are our most common defense against such technology
failures. However, these only let you restore your data back to the point of
disaster. Data and time are bound to be lost, and even if the time lost is minimal,
time lost is still money lost.
The catch phrase these days for keeping systems running is "High Availability."
The promise of high availability solutions is 24/7 uptime -- or, more accurately,
no unscheduled downtime. There are three primary options for high availability
with Exchange Server 2007: Local Continuous Replication, Cluster Continuous
Replication and Single Copy Clusters.
Understanding these options will give you a better vision of what you can provide
your organization when you're deploying Exchange Server 2007 out of the box.
These solutions all provide varying degrees of high availability, so not all
solutions are equal-and not all solutions involve clustering. Clustering is
often synonymous with the concept of high availability, but it's no longer an
First Line of Defense: Transaction Logs
Exchange makes a valiant attempt to provide its own redundancy right out of
the box. Part of Exchange's overall architecture includes storage groups. Each
of these storage groups contains several databases. Exchange 2007 Standard Edition
lets you create up to five storage groups and mount up to five databases. Exchange
2007 Enterprise Edition lets you create up to 50 storage groups and mount up
to 50 databases.
When you install Exchange in the mailbox server role, you'll find one default
storage group, which contains one default mailbox database (typically labeled
Mailbox Database.edb, as database files use .EDB extensions). For each new database
you add, you increase the number of .EDB files within a storage group. You could
also create additional storage groups with additional databases.
Transaction logs help keep each database up-to-date. When data comes into the
Exchange server, typically as an e-mail message, it enters the system memory.
From memory, it's written to a transaction log. Each log reaches a maximum size
of 1MB (a reduction from 5MB in Exchange 2003). These transaction logs are eventually
added to the database that stores the mailbox for the intended recipient.
There's a check file that keeps track of which transaction logs have been updated
into the database. The benefit here is that you have redundancy, although it
acts as protection for you only if you go through the effort of separating the
disk location of your logs and database.
This allows for better performance and proper disaster recovery. In the event
that the database is corrupted or the disk carrying the database crashes, those
transaction logs are invaluable. You can combine them with the latest backup
to restore your system. Understanding how the transaction logs and the database
work together is essential to understanding these high availability solutions
for Exchange 2007.
High Stakes, High Availability
As mentioned earlier, there are three primary high availability options beyond
your ability to structure your database and transaction logs for improved performance
and availability. Placing your database on a Raid 5 disk structure and mirroring
your transaction logs is a recommended practice, but even that won't prevent
If you want to go beyond the standard recommendations, consider something like
Local Continuous Replication (LCR). LCR is a single-server solution that uses
asynchronous log shipping and replay from one set of disks to another (see Figure
[Click on image for larger view.]
|Figure 1. Local
Continuous Replication works by continuously making and updating copies
of the transaction log.
So what does asynchronous log shipping mean? When you first establish an LCR
-- or even Cluster Continuous Replication (CCR), for that matter (more on that
later) -- it makes a copy of the database. Transaction logs keep the database
up-to-date from that point forward. A log is closed once it has been entered
into the database. You can have it shipped over to the second disk and replayed
into the secondary copy of the database.
The caveat here is that this time divergence means the secondary copy can't
be 100 percent in sync with its primary. This means is you have the potential
to lose some data, depending on when a failure occurs.
Although it's often called the "poor man's cluster," LCR isn't technically
a cluster. For those of you familiar with mirror sets, the concept here is similar.
It's based primarily on your chosen Exchange storage group.
You can create an LCR set when you create a new storage group. You can also
create one for an existing storage group. You establish the LCR through the
GUI with the Exchange Management Console or even through PowerShell commands
from within your Exchange Management Shell. In the event that one disk crashes
or the database is corrupted, you can switch over to the secondary copy of the
data by typing in a manual switch. Keep in mind that this is an inexpensive
solution that you can do from Standard Windows Server 2003.
Clustering to the Rescue
CCR is a clustered solution that allows for two nodes in a cluster-one is the
active node and the other is the passive node for automatic failover (see Figure
2). Both nodes must be servers with the Exchange 2007 Mailbox role installed.
[Click on image for larger view.]
|Figure 2. Cluster
Continuous Replication is true clustering technology, using an active node
and a passive node that activates in the event of active node failure.
The benefit here is that you eliminate single points of failure because there
are two unique systems with two sets of storage. This offers a higher level
of availability than an LCR set. The caveat here is that you'll need to invest
more money in hardware (for the extra system) and software (because to perform
the cluster you'll need to be running the Enterprise Edition of Windows Server
2003). This type of solution also uses asynchronous log shipping and replay
to keep the database up-to-date between the active and passive copy of the data.
To fully understand the way CCR works, visualize the two servers. The active
server has a network connection to the public network. The passive server does,
as well. Between the two nodes, however, is a private network connection on
a separate network-addressing scheme that carries the "heartbeat"
signal between them.
The passive server waits patiently, as long as it receives a heartbeat from
the active node saying "I'm alive." It then responds back that it,
too, is alive. You can configure the cycle of these heartbeats, but by default
they're sent every 1.2 seconds from each cluster node.
If the passive server doesn't receive a heartbeat (which could happen for any
number of reasons), it starts getting edgy and eager to become active. If, however,
it did become active while the other server was also active, it could cause
a problem known as split-brain syndrome. To prevent this problem, there's a
quorum (called a Majority Node Set, or MNS quorum) that maintains a share file
witness between these two servers.
This is held on a third server (typically the Hub Transport server of the same
Active Directory site as the passive and active nodes), and makes the final
determination for the passive node if indeed the active is alive and well. In
the event that the active server is actually down, the passive server will automatically
come to life and assume the workload.
While asynchronous log shipping (also used in CCR) may involve some data loss,
there's another process that can prevent this loss when used in a CCR set. On
the Hub Transport server, there's a feature you can configure called the Transport
Dumpster. This retains a predetermined amount of mail-message data before delivering
it to the cluster.
If the active node goes dead and the passive node jumps in, one of the first
orders of business is for the "new" active server to check in with
all Hub Transport servers and request any mail data it may not have received.
This new active server will double-check all incoming data. It will retain any
new messages and discard duplicates. This ensures a greater degree of high availability
Single Copy Clusters (SCCs) are similar in design to the high availability solution
available in Exchange 2003. You have a two-node cluster that relies on a single-storage
location (see Figure 3). This type of solution provides system redundancy, but
requires that you provide your own storage redundancy (which could be a NAS
or SAN with RAID-level redundancy).
[Click on image for larger view.]
|Figure 3. Single
Copy Clusters also use an active and passive node, but share the same storage
In Exchange 2003, you could configure an active/active mode where both servers
were active simultaneously. This solution was so problematic that instead of
being updated and enhanced for Exchange 2007 it was discontinued. SCC works
with the active/ passive configuration. To evaluate this solution on cost, keep
in mind that SCC requires two systems, a RAID-enabled storage solution and the
Enterprise Edition of Windows Server 2003.
While LCR, CCR and SCC are the three primary options, the Exchange development
team has announced it will release another solution with Exchange 2007 Service
Pack 1 later this year.
"With Standby Continuous Replication [SCR], data can be replicated on
a per-storage group basis to standby servers or clusters," according to
the Exchange development team. "The SCR target, whether a single mailbox
server or a cluster, can be placed inside the primary data center or in a remote
location, ready to be manually activated if the primary server or data center
fails." Stay tuned for more on this development.
Which Way To Go?
Making the right decision of which approach is best for your environment is
a tough one. You need to weigh the cost of high availability against your needs.
You may decide a third-party solution is worth the added cost for even higher
availability. Whichever method you choose, rest assured that Exchange 2007 has
been designed to make any high availability solution easy to execute.