email filtering April 12, 2004 (as amended) this page is one of a series, see end for links
Below is the process I use to avoid spam, email-based viruses, and phish. The first few steps are for everyone, while the last few are more for diehards like myself, who wish to
improve the filters, as well as use them.
Reduce the attack surface: shut down unused accounts, and close catch-all mailboxes
Install primary filter: set up SpamPal or equivalent, and set your mail reader to filter mails with Spampal's "** SPAM **" in the subject line to the spam folder
Spampal uses a number of forms of spam detection; for each email, it weights each test result and computes a final "spam score". Emails scoring
over a certain threshold are considered spam and have the string "** SPAM **" inserted at the start of their subject line. I recommend you use
the Public Blacklists, plus the Regexfilter plugin.
Add filtering of specific filetypes, character sets, keywords and domains as desired. This is where to add rules for anti-virus and phish filtering.
[optional] Create secondary filters: set your mail reader to filter whitelisted mail to other folders (recommended)
Secondary filtering occurs once the primary filter has processed the message. In the beginning, this was used simply to move mail marked as spam to the spam folder,
however it can also be used to further clean your inbox. The filters can be as fancy as your mail program lets them be. I use secondary filtering as follows:
Whitelisted personal address filters: These test for specific email addresses I use, and lift all mail that is actually addressed to me out of my inbox and into various other folders (including my "real" inboxes). They examine the To: and CC: fields. Unfortunately the occasional spam, addressed correctly and failing to match any of the other filters, is also moved to my "real" inboxes.
Whitelisted family, friend and mailing list filters: These test for specific email addresses in the To: and CC: fields, and specific strings in subject lines. They lift all mail I have previously defined as "OK" out of my inbox and into various other folders. This is where I filter my inbox for email@example.com, for example, which enables me to move all messages from the Esteemed List into an Esteemed Folder specifically about viruses.
Blacklisted personal address filters: These test for specific email addresses I no longer use. Mails to these addresses are almost certainly spam, and I filter them to the spam folder.
[optional] Manually collect spam that makes it past the filters, and move it to a special folder - do NOT delete it.
When this folder grows large, use it to refine Regexfilter's ruleset.
All this filtering has the effect of leaving only "mystery" mail in my inbox. I am certain that, if widely deployed, this degree of filtering would make it uneconomical for spammers to operate, and infeasible for mass-mailing viruses to propagate.
There are various black holes into which various emails still fall (if someone wants to email me a ZIP or an EXE, they must encrypt it to one of my public keys, then send it; joining new mailing lists often means some fiddling around with Spampal's whitelists). But compared to a choked inbox, these problems are minor.
false positives are NON-spam that is detected as spam; false negatives are spam that is NOT detected as spam
use a fast connection, so that mail doesn't take long to download
use a fast computer, so it doesn't slow down much (or, move some/all of the filtering to a secondary computer) - a slow, old computer will work, however
wading through the spam folder
Assuming you set up at least one spam filter, you'll end up with a folder in your email application full of messages that tripped the filter. If the filters are any good, most of these messages will be spam. However the occasional legitimate email may end up there, for numerous reasons; it's thus a good idea to periodically look through this folder, rather than just deleting all the messages without looking at them. These erroneously filtered messages are known as false positives. Here's the process I use to go through the spam folder:
Sort by size - the biggest messages are sometimes false positives (attachments from associates, etc). Also the smallest are usually blank and can be removed.
Sort by datetime - delete the uppermost and lowermost portions of the list (messages from the past and future are used by spammers to force their messages to these locations)
Sort by subject - delete the uppermost and lowermost portions of the list (spammers use punctuation and other tricks to force their messages to these locations)
Sort by name - delete the uppermost and lowermost portions of the list (spammers use punctuation and other tricks to force their messages to these locations); scroll through the list looking for repeats of the same name (most spammers don't usually use the same name twice, so repeats here are false positives, or duplicate messages)
Search for known strings (words or phrases in subject lines, or email addresses, that are known to end up in the spam folder)
Add new strings found to the list of strings (above) and/or whitelist them somehow
Depending on the volume of messages remaining, you may wish to scroll through the list for known senders
Delete all the remaining messages
Problem: If your spam folder ends up with 1000's of messages, it's not feasible to go through it manually, BUT there may be mail in there from someone you know.
Solution: Periodically scan the spam folder for mail from anyone on a list of known-good senders, move any matching messages from the spam folder to the inbox.
The scan is done by software (not currently available for download); before scanning occurs, the list of known-good senders is obtained from multiple sources, and built on-the-fly.
Using an automated tool like this means false positives are less important, as the tool can find them, even if they are buried amongst thousands of spams.