Main Page Content Starts

easyweb.co.uk

Photography and fine web writing since the last century

Add new comment

Spam Spam Spam Spam (DSPAM rocks)

Spam - hate it or really hate it, you have to do something about it. Having recently moved my domain home, and being not altogether happy with the number of classification errors that SpamAssassin was throwing out these days, I've now moved to DSPAM, and it works really really well.

Why do I need it

My email address is pretty public - it's on this site, and it's in the public archives of a number of email addresses. This isn't pure dumbness on my part - I have good reason for wanting to be findable by email - but it does have spam spin-offs that are less than desirable.

To put it into context, take a look at the spam/ham chart that DSPAM has compiled over the last 14 days:
Ham and Spam chart, showing spam running at 100 a day or so
Yes, that's around 100 spams a day for my main email address alone - 62% of my email is spam. Lucy gets quite a few too, and the domain gets up to a thousand a day sent to non-existent addresses.

In the old mail setup, I downloaded all mail from an external mailserver and pushed it through Spamassassin, which did pretty well. However, its effectiveness was deteriorating as spam evolved and spammers tested their efforts against it and designed round it. Even with daily training of the Bayesian classification and scoring highly for the Bayesian test and checking each email against thirdparty blacklists, I was still getting about 4% of the incoming spams slipping through.

With a new mailserver running at home, I thought it was time for a new approach, and a pure-statistical filter sounded like the way to go as heuristics become very quickly outdated and outclassed. So I decided to give DSPAM a go.

Installation

This took quite a bit of experimentation, as there isn't much documentation for DSPAM that covers my setup (QMail with vpopmail). These instructions were extremely helpful, and got me about 90% of the way.

Here's are the changes I made from my vanilla qmailrocks installation:

VPopMail installation

You'll have to go back and reinstall VPopMail with the following switch:

--enable-qmail-ext=y

This will enable you to set up aliases that let you forward spam misses back into DSPAM, of which more later.

I installed DSPAM with the following config:

./configure \
--with-dspam-home=/etc/dspam/ \
--prefix=/etc/dspam \
--with-delivery-agent='/usr/local/bin/maildrop' \
--with-dspam-owner=vpopmail \
--enable-virtual-users \
--enable-domain-scale \
--with-dspam-group=vchkpw \
--with-dspam-home-group=vchkpw \
--with-dspam-home-owner=vpopmail \
--disable-trusted-user-security \
--enable-debug \
--with-storage-driver=mysql_drv \
--with-mysql-includes=/usr/include/mysql \
--with-mysql-libraries=/usr/lib

CGI install

Script customisation for Qmail/VPopMail

Assuming you've followed the instructions above re VHosts and .htaccess, the user will be hitting your CGI with $ENV{'REMOTE_USER'} set to their email address. To be able to use this, we need to split this out into domain and user, to be able to correctly identify their spam. So we make the following changes to dspam.cgi:

# Add $VPOPUSERNAME $VPOPDOMAIN  to the local variable definition
use vars qw { %CONFIG %DATA %FORM $MAILBOX $USER $VPOPUSERNAME $VPOPDOMAIN };

# Make sure you're set to use Domain scale
$CONFIG{'DOMAIN_SCALE'} = 1;                    # --enable-domain-scale

# And change the Domain Scale config section to this:
  $VPOPUSERNAME = (split(/@/, $ENV{'REMOTE_USER'}))[0];
  $VPOPDOMAIN = (split(/@/, $ENV{'REMOTE_USER'}))[1];
  $USER = "$CONFIG{'DSPAM_HOME'}/data/$VPOPDOMAIN/$VPOPUSERNAME/$VPOPUSERNAME";
  $MAILBOX = "$CONFIG{'DSPAM_HOME'}/data/$VPOPDOMAIN/$VPOPUSERNAME/$VPOPUSERNAME.mbox";

You also probably want to symlink /etc/dspam/bin/dspam to /usr/bin/dspam for ease of use.

Configuring

User setup

For each user, set up a .qmail file in ~vpopmail/domains/domain.co.uk/foo/ containing the following:

|/usr/bin/dspam --user $EXT@$HOST --deliver=innocent --mode=teft --feature=chained,noise,whitelist

This will ensure that all mail delivered to the user goes through DSPAM before it hits the mailbox, and only mail that DSPAM classifies as ham gets delivered. --mode=teft tells DSPAM to train on *every* message.

Important Note: If you offer qmailadmin access to your users, they will break this whenever they add a vacation message or do any other action that edits their .qmail file.

Handling Spam Misses

Once you've reinstalled VPopMail, go to ~vpopmail/domains/domain.co.uk/ and add a .qmail file: .qmail-spam-foo@domain.co.uk (assuming your email address is foo@domain.co.uk) containing the following:

|/usr/bin/dspam --user foo@domain.co.uk global --mode=teft --class=spam --source=error

This swaps the message from the ham count to the spam count, and tells the filter to retrain on it. Any spams that make it through can now just be forwarded to spam-foo@domain.co.uk

Quarantine - Handling False Positives

Once I was at this stage, it all seemed to work, except for getting false positives out of quarantine back into the mailsystem. The QmailRocks install doesn't set up maildrop to be able to use the -d %u flag, so email sent through DSPAM with those switches gave maildrop errors.

Reading back through the Qmail + Vpopmail + DSPAM + Maildrop Howto Quarantine instructions, I came up with a solution which works beautifully. Make this change to dspam.cgi:

$CONFIG{'DSPAM_ARGS'}  = "--deliver=innocent --mode=tum --class=innocent --feature=whitelist,noise,chained " .
                          "--source=error --user $ENV{'REMOTE_USER'} --stdout | /var/qmail/bin/qmail-inject $ENV{'REMOTE_USER'}";

DSPAM can output to stdout (that's the --stdout switch!) and all you have to do then is pipe it straight back into qmail-inject marked as ham, and qmail will deliver it to the user's maildir.

Training

DSPAM starts off as a completely naive filter. It really has no idea what you class as spam and ham. To start with, then train DSPAM on the SpamAssassin Training Corpus. If you have a mailbox of spam (I had a thousand or so I'd been saving), train on these. Also train on a mailbox of good email (use --class=innocent), ideally for each user.

Honeypots

On my domain, I also have a catch-all account that I switch on from time to time, taking advantage of the high volume of spam sent to non-existent addresses at easyweb. Here's the .qmail file content:

|/usr/bin/dspam --user global foo@domain.co.uk foo@domain1.co.uk foo1@domain.co.uk --mode=teft --class=spam --source=corpus --feature=chained,noise 

Now that I have a reasonably mature corpus, I have a secondary honeypot address that provides intensive training using DSPAM's innoculation feature. I advertise the address (yumyum@easyweb.co.uk) in my email signature so it gets publically archived and I hope harvested by spambots. Any email that comes in is by definition spam (or sent by someone so dim as they deserve to be killfiled), and it goes to .qmail-yumyum, which has the following content:

|/usr/bin/dspam --user global  foo@domain.co.uk foo@domain1.co.uk foo1@domain.co.uk --mode=teft --class=spam --source=inoculation --feature=chained,noise

Any tokens already in the classification database are trained twice. Anything new is trained five times.

Results

Having been running DSPAM for a couple of weeks now, starting from nothing, here's the story so far:

Filtering Stats

It is Wed Aug 11 12:06:47 2004

DSPAM has caught 1131 spams
...learned 80 spams
...scanned 733 innocent emails
...with 5 false positives
Your SPAM Ratio is62.07%

Performance Statistics

Your spam filtering accuracy is98.178% since last reset
Your false positive rate is2.422% since last reset
Your overall accuracy is97.957% since last reset
 9008 corpusfed spams
 2650 corpusfed nonspams
 (9/485) SM
 (7/282) FP

I'm extremely happy with this - after a short time, I've gone from 1 in 20 spams making it through the spamfilter to 1 in 55.

Other Spam Reduction Tactics

My mailserver has the following running, which reduce the quantity of non-welcome email that gets as far as DSPAM:

  • ClamAV - mailserver based Virus scanner
  • SPF. Much spam comes from forged addresses - SPF checks for forgeries and blocks them

Both courtesy of the wonderful QmailRocks Qmail installation package. Spams that would otherwise have possibly made it through don't even figure in the stats above.

Being a GoodCitizen, I also report the few spams that make it through DSPAM to SpamCop.

If I get time, I'll also set up SMTP blocking for all mail originating in China, Korea and Brazil, as that's the source of much of my spam, and in over 10 years with email, I've yet to receive any real email from those countries.

Trackback URL for this post:

http://www.easyweb.co.uk/trackback/50
 

Reply

*
*
The content of this field is kept private and will not be shown publicly.


*

  • Allowed HTML tags: <a> <em> <strong> <cite> <sup> <code> <ul> <ol> <li> <dl> <dt> <dd><q><blockquote><h2><h3><h4><h5><h6><ins><del>
  • Lines and paragraphs break automatically.
  • Web and e-mail addresses are automatically converted into links.
 
 
 
 
 

The access keys for this page are: ALT (Control on a Mac) plus: