Testing Exchange's New Spam Filter
In spite of its shortcomings, you can't beat the price of Microsoft's Intelligent Message Filter.
- By Bill Boswell
- August 01, 2004
Everyone hates spam. With this in mind, Microsoft in late May released an antispam add-on to Exchange 2003 called Intelligent Message Filter (IMF). Although the original plan called for making IMF available only to Software Assurance customers, Microsoft decided to provide it to anyone who has deployed Exchange Server 2003 (IMF doesn't run on Exchange 2000 or earlier).
Outlook 2003 has a built-in spam filter, and quite a few free or low-cost Outlook add-on filters for spam, but these point solutions don't scale well in an enterprise. It's much more effective to block spam at the gateway, although you do risk blocking real e-mail. Many users would rather cope with spam than risk missing critical e-mail from a client.
The IMF attempts to find a balance between gateway filtering and evaluation of potential false positives by using a two-pronged approach to filtering coupled with the assignment of a number called a Spam Confidence Level, or SCL. The IMF assigns an SCL in the range of 0-9 to each incoming SMTP message based on its potential for being spam. The higher the SCL, the spammier the message.
Microsoft hasn't released details about how IMF actually operates other than it makes heuristic decisions, which is a programmer's way of saying that the rules are flexible but can't be described to the public.
IMF adds the SCL rating to the header of each message, so you need only install IMF at the Exchange 2003 servers that act as inbound SMTP gateways. If you use a third-party server as an SMTP gateway, or a Windows server running SMTP but not Exchange, then you'll need to run IMF at each Exchange 2003 server.
IMF has two options for handling messages based on their assigned SCL. These options are exposed in the Message Delivery properties window in Exchange System Manager (ESM) (see Figure 1).
|
Figure 1. Intelligent Message Filtering configuration settings in the Message Delivery properties.
(Click image to view larger version.) |
The "Gateway Blocking" option tells IMF to block or archive messages that meet or exceed the specified SCL setting, with the default being 8 out of a possible 9. Any messages that get this SCL rating are either archived or deleted right at the gateway before they can be placed into the information store.
Messages that have a lower SCL rating might possibly be desirable rather than spam. The "Store Junk E-mail" option allows these messages to be sent to the user but they're placed into a special Junk Mail folder in the user's mailbox.
The other Exchange servers in the organization don't need to run IMF. They've already been coded to look for SCL entries in message headers.
Like other policies in Exchange, the Store Junk E-mail setting must be applied to each server individually, using ESM. Drill down to the Protocols folder under each server and expand the tree under the SMTP protocol. You'll find a new Intelligent Messaging Filter icon, shown in Figure 2.
|
Figure 2. IMF icon under SMTP protocol for individual Exchange 2003 server.
(Click image to view larger version.) |
The properties window for the Intelligent Message Filtering icon has an option to apply the IMF policy to the SMTP service on the server. You must apply the IMF setting to the SMTP virtual server on every Exchange 2003 server, including the gateway server or servers that are running IMF.
How Much to Block?
One of your first jobs after installing IMF is to determine where to set the threshold for the two filter limits: the block/archive threshold and the junk mail threshold.
To help make the SCL threshold determination, I devised a simple test. I used mail from my own mailbox to create a set of messages in three categories: Good Mail, Guaranteed Spam and Potential Spam. To determine if a message was spam, I used an Outlook add-on called Inboxer, www.inboxer.com, which uses a trainable Bayesian filter to triage messages based on spam content. I then created an Access database for each message category using Outlook's Export feature.
Armed with these three message databases, I wrote a script that would go through the databases and send each entry as if it were a distinct mail message targeted at the SMTP server and recipient of my choice. In essence, I created a little bulk mail engine. To simplify message handling in the script, I used an SMTP toolkit called ActiveEmail from ActiveXperts, www.activexperts.com/activemail.
When you go through the script, take note that the connString syntax points at the path to the Access database (mdf file). I used three databases: Goodmail.mdb, GuaranteedSpam.mdb, and PossibleSpam.mdb. The syntax for the SELECT statement includes flags for connection handling. I assigned the entire content of the "Body" field to a standard VBScript variable, msgBody, then assigned the msgBody variable to the "Body" property of the SMTPMail object. This ensured that the entire block of text would be processed as a unit and evaluated as plain text. Here's the script:
targetAddress = "[email protected]"
targetName = "Test User"
targetServer = "w2k3-s1.company.com"
Set aeObj =
CreateObject("ActiveXperts.SmtpMail")
connString = "DRIVER={Microsoft Access Driver (*.mdb)};" &_
"DBQ=c:\spamsamples\goodmail.mdb;DefaultDir=;UID=;PWD=;"
Set Connection =
CreateObject("ADODB.Connection")
connection.connectionTimeout = 30
connection.CommandTimeout = 80
Connection.Open connString
Set rs = CreateObject("ADODB.RecordSet")
rs.open "SELECT * From EMAIL", Connection, 0, 1
Do While Not rs.eof
aeobj.clear
aeObj.HostName = targetServer
msgBody = rs.Fields("Body")
Echo the e-mail message to the screen before sending it.
WScript.Echo "From: " & rs.Fields("FromName") & " " & rs.Fields("FromAddress")
WScript.Echo "To: " & targetAddress
WScript.Echo "Subject: " & rs.Fields("Subject")
WScript.Echo msgBody
wscript.echo string(40,"*")
'Assign entries to the ActivEmail object
aeObj.FromAddress = rs.fields ("FromAddress")
aeObj.FromName = rs.fields ("FromName")
aeObj.AddTo targetAddress, target Name
aeObj.Subject = rs.fields("Subject")
aeObj.BodyType = 0
aeObj.Body = msgBody
aeObj.Send
WScript.Echo "Sent message with this result: " & aeObj.LastError
wscript.echo string(40,"*") & vbNL & vnNL
rs.movenext
Loop
Now that I had a way to send spam to a target Exchange server, I needed a simple way to determine the SCL rating assigned to each message. To do this, I made use of a technique posted by James Webster at the MS Exchange Team blog. The posting contains instructions for building a custom Outlook form that exposes the SCL rating in the heading of a message. Webster also posted instructions for exposing the SCL rating in OWA.
To make sure all the inbound messages from my little bulk e-mail engine arrived at the target mailbox, I set the gateway block/archive threshold in the Message Delivery properties to take no action and I put the Junk Mail threshold at 9. I then fired the messages from my databases at a test user and sorted the results by SCL. Table 1 shows the SCL results for the contents of the GoodMail database.
Of the 541 messages in the GoodMail database, 62 percent (332 messages) were given a completely clean score of 0 by the IMF. Another 138 got a score of 1-3, making them almost certainly not spam. That left 71 messages, or 13.1 percent of the total, with sufficient spamminess to make them interesting for analysis. Because I had already purged this database of spam, this 13.1 percent represents potential false positives in the SCL range of 4-7.
I noticed very little difference between messages with the three middle ratings. Several of the SCL 5 messages came from my national account manager and many of the SCL 4 messages came from friends and relatives. I wouldn't have wanted the gateway to block or archive any of these messages.
Of the messages that were assigned an SCL of 6 or higher, four were newsletters to which I subscribe and two turned out to be spam that slipped past my personal spam filter. Good job, IMF!
Table 2 shows test results from the Guaranteed Spam database. Messages that got an SCL rating of 0-3 resembled newsletters, which may explain why they got such low scores. However, two messages flagged with an SCL of 1 were instances of the classic, "I am a distinguished representative of a troubled nation." I'm surprised IMF flagged this type of message as acceptable.
Table 3 shows the results for messages flagged by my personal spam filter as probable spam but worth checking for false positives. Before running the test, I purged the list of all good e-mail. The vast majority of messages getting an SCL of 0-5 were one-liners such as "See attached" or "Read this document." IMF doesn't read attachments.
|
Setting the Threshold
If you elect to archive messages at or above a given SCL at the gateway, the messages that meet the threshold go into a folder called UCEArchive under \Exchsrvr\Mailroot\vsi 1. Users can't access these messages to evaluate for false positives. If a user thinks that an important incoming e-mail has been diverted by the spam filter, you as the administrator must search the UCEArchive folder for any messages containing the complaining user's e-mail address. The messages are stored as EML files, so they can be opened in Outlook Express or Notepad. You can add the following Registry hack at the IMF server to include the SCL rating in the headers of the archived messages:
Key: HKLM | Software | Microsoft | Exchange | ContentFilter Value: ArchiveSCL Data: 1 (REG_DWORD)
Because users can't access archived messages, you need to set the block/archive threshold high enough to be sure you aren't going to get false positives. As you can see by comparing the tables, the ability of IMF to differentiate between spam and ham is fairly good when the spam content is undeniable, so setting IMF to block or archive messages with an SCL of 7 or higher shouldn't result in rejecting potentially good mail.
However, there's quite a bit of gray area between SCL 4-6 where the messages need human evaluation. If you set the Junk Mail threshold at SCL 6, a significant amount of spam gets into the user's main inbox. However, setting the threshold at SCL 4 could put quite a bit of regular mail into the Junk Mail folder.
In my opinion, forcing the users to search their Junk E-mail folder for many potentially good messages is asking too much of them. I recommend setting the Junk Mail limit to SCL 6 then letting your users delete any spam that slips through.
Unless your users are diligent about cleaning out their Junk E-mail folders after they review the mail, you'll find that messages build up in your Exchange stores like barnacles on a battleship. You can prevent long term buildup by creating a Mailbox Manager job that periodically cleans older mail from the Junk Mail folder.
If you use Microsoft Operations Manager (MOM) to monitor your Exchange servers, you should download the IMF Management Pack for MOM. This free addition to your MOM arsenal provides the monitoring capability you need to assure proper operation of IMF servers. Download from http://snipurl.com/6ta8.
Is IMF All the Spam Filtering You Need?
I'm accustomed to using a trainable Bayesian filter for my personal mail and I found the inability to train the IMF to be a significant disadvantage. This means that an SCL 4 message containing the text "Happy with your current proportions? Look at http://v-a-r-o-o-m.com." will continue to reach a user's mailbox, day after day, until the next update to the IMF database. To get that update, you must go to Microsoft's update site, which requires planning and additional configuration. Compare this to third-party products where the database updates happen transparently and automatically.
Also, I was disappointed at the large numbers of spam messages that must get through the gateway to the user's mailboxes to avoid false positives. Several third-party products make use of thumbprints and rapid database updates to identify spam that otherwise would evade a strict Bayesian filter.
I didn't test the performance impact of running the IMF. The number would have little meaning, in any case, unless it was given in comparison with other products on similar hardware. It is possible to combine ham and spam in the set of test messages used by Microsoft's Load Simulator for Exchange 2003 (Loadsim 2003). Using Loadsim and your own ham-and-spam message combination, you can come up with a comparison benchmark for your own hardware. Download Loadsim 2003 from http://snipurl.com/6tkt.
In spite of its shortcomings, you can't beat the price of IMF. If you're deploying Exchange 2003, you should at least evaluate it before spending additional money on third-party products. If you want to see real-time demos of some of the more popular antispam products, come to the Fall TechMentor event in San Jose, California (Sept. 27-Oct. 1, www.techmentorevents.com). I'll be showing many features of these products in head-to-head comparisons.