Main Page Content Starts
easyweb.co.uk
Photography and fine web writing since the last century
Add new comment
Spam Spam Spam Spam (DSPAM rocks)
Spam - hate it or really hate it, you have to do something about it. Having recently moved my domain home, and being not altogether happy with the number of classification errors that SpamAssassin was throwing out these days, I've now moved to DSPAM, and it works really really well.
Why do I need it
My email address is pretty public - it's on this site, and it's in the public archives of a number of email addresses. This isn't pure dumbness on my part - I have good reason for wanting to be findable by email - but it does have spam spin-offs that are less than desirable.
To put it into context, take a look at the spam/ham chart that DSPAM has compiled over the last 14 days:

Yes, that's around 100 spams a day for my main email address alone - 62% of my email is spam. Lucy gets quite a few too, and the domain gets up to a thousand a day sent to non-existent addresses.
In the old mail setup, I downloaded all mail from an external mailserver and pushed it through Spamassassin, which did pretty well. However, its effectiveness was deteriorating as spam evolved and spammers tested their efforts against it and designed round it. Even with daily training of the Bayesian classification and scoring highly for the Bayesian test and checking each email against thirdparty blacklists, I was still getting about 4% of the incoming spams slipping through.
With a new mailserver running at home, I thought it was time for a new approach, and a pure-statistical filter sounded like the way to go as heuristics become very quickly outdated and outclassed. So I decided to give DSPAM a go.
Installation
This took quite a bit of experimentation, as there isn't much documentation for DSPAM that covers my setup (QMail with vpopmail). These instructions were extremely helpful, and got me about 90% of the way.
Here's are the changes I made from my vanilla qmailrocks installation:
VPopMail installation
You'll have to go back and reinstall VPopMail with the following switch:
--enable-qmail-ext=y
This will enable you to set up aliases that let you forward spam misses back into DSPAM, of which more later.
I installed DSPAM with the following config:
./configure \
--with-dspam-home=/etc/dspam/ \
--prefix=/etc/dspam \
--with-delivery-agent='/usr/local/bin/maildrop' \
--with-dspam-owner=vpopmail \
--enable-virtual-users \
--enable-domain-scale \
--with-dspam-group=vchkpw \
--with-dspam-home-group=vchkpw \
--with-dspam-home-owner=vpopmail \
--disable-trusted-user-security \
--enable-debug \
--with-storage-driver=mysql_drv \
--with-mysql-includes=/usr/include/mysql \
--with-mysql-libraries=/usr/lib
CGI install
Script customisation for Qmail/VPopMail
Assuming you've followed the instructions above re VHosts and .htaccess, the user will be hitting your CGI with $ENV{'REMOTE_USER'} set to their email address. To be able to use this, we need to split this out into domain and user, to be able to correctly identify their spam. So we make the following changes to dspam.cgi:
# Add $VPOPUSERNAME $VPOPDOMAIN to the local variable definition
use vars qw { %CONFIG %DATA %FORM $MAILBOX $USER $VPOPUSERNAME $VPOPDOMAIN };
# Make sure you're set to use Domain scale
$CONFIG{'DOMAIN_SCALE'} = 1; # --enable-domain-scale
# And change the Domain Scale config section to this:
$VPOPUSERNAME = (split(/@/, $ENV{'REMOTE_USER'}))[0];
$VPOPDOMAIN = (split(/@/, $ENV{'REMOTE_USER'}))[1];
$USER = "$CONFIG{'DSPAM_HOME'}/data/$VPOPDOMAIN/$VPOPUSERNAME/$VPOPUSERNAME";
$MAILBOX = "$CONFIG{'DSPAM_HOME'}/data/$VPOPDOMAIN/$VPOPUSERNAME/$VPOPUSERNAME.mbox";
You also probably want to symlink /etc/dspam/bin/dspam to /usr/bin/dspam for ease of use.
Configuring
User setup
For each user, set up a .qmail file in ~vpopmail/domains/domain.co.uk/foo/ containing the following:
|/usr/bin/dspam --user $EXT@$HOST --deliver=innocent --mode=teft --feature=chained,noise,whitelist
This will ensure that all mail delivered to the user goes through DSPAM before it hits the mailbox, and only mail that DSPAM classifies as ham gets delivered. --mode=teft tells DSPAM to train on *every* message.
Important Note: If you offer qmailadmin access to your users, they will break this whenever they add a vacation message or do any other action that edits their .qmail file.
Handling Spam Misses
Once you've reinstalled VPopMail, go to ~vpopmail/domains/domain.co.uk/ and add a .qmail file: .qmail-spam-foo@domain.co.uk (assuming your email address is foo@domain.co.uk) containing the following:
|/usr/bin/dspam --user foo@domain.co.uk global --mode=teft --class=spam --source=error
This swaps the message from the ham count to the spam count, and tells the filter to retrain on it. Any spams that make it through can now just be forwarded to spam-foo@domain.co.uk
Quarantine - Handling False Positives
Once I was at this stage, it all seemed to work, except for getting false positives out of quarantine back into the mailsystem. The QmailRocks install doesn't set up maildrop to be able to use the -d %u flag, so email sent through DSPAM with those switches gave maildrop errors.
Reading back through the Qmail + Vpopmail + DSPAM + Maildrop Howto Quarantine instructions, I came up with a solution which works beautifully. Make this change to dspam.cgi:
$CONFIG{'DSPAM_ARGS'} = "--deliver=innocent --mode=tum --class=innocent --feature=whitelist,noise,chained " .
"--source=error --user $ENV{'REMOTE_USER'} --stdout | /var/qmail/bin/qmail-inject $ENV{'REMOTE_USER'}";
DSPAM can output to stdout (that's the --stdout switch!) and all you have to do then is pipe it straight back into qmail-inject marked as ham, and qmail will deliver it to the user's maildir.
Training
DSPAM starts off as a completely naive filter. It really has no idea what you class as spam and ham. To start with, then train DSPAM on the SpamAssassin Training Corpus. If you have a mailbox of spam (I had a thousand or so I'd been saving), train on these. Also train on a mailbox of good email (use --class=innocent), ideally for each user.
Honeypots
On my domain, I also have a catch-all account that I switch on from time to time, taking advantage of the high volume of spam sent to non-existent addresses at easyweb. Here's the .qmail file content:
|/usr/bin/dspam --user global foo@domain.co.uk foo@domain1.co.uk foo1@domain.co.uk --mode=teft --class=spam --source=corpus --feature=chained,noise
Now that I have a reasonably mature corpus, I have a secondary honeypot address that provides intensive training using DSPAM's innoculation feature. I advertise the address (yumyum@easyweb.co.uk) in my email signature so it gets publically archived and I hope harvested by spambots. Any email that comes in is by definition spam (or sent by someone so dim as they deserve to be killfiled), and it goes to .qmail-yumyum, which has the following content:
|/usr/bin/dspam --user global foo@domain.co.uk foo@domain1.co.uk foo1@domain.co.uk --mode=teft --class=spam --source=inoculation --feature=chained,noise
Any tokens already in the classification database are trained twice. Anything new is trained five times.
Results
Having been running DSPAM for a couple of weeks now, starting from nothing, here's the story so far:
Filtering Stats
It is Wed Aug 11 12:06:47 2004 | |
| DSPAM has caught | 1131 spams |
| ...learned | 80 spams |
| ...scanned | 733 innocent emails |
| ...with | 5 false positives |
| Your SPAM Ratio is | 62.07% |
Performance Statistics
| Your spam filtering accuracy is | 98.178% since last reset |
| Your false positive rate is | 2.422% since last reset |
| Your overall accuracy is | 97.957% since last reset |
| 9008 corpusfed spams | |
| 2650 corpusfed nonspams | |
| (9/485) SM | |
| (7/282) FP |
I'm extremely happy with this - after a short time, I've gone from 1 in 20 spams making it through the spamfilter to 1 in 55.
Other Spam Reduction Tactics
My mailserver has the following running, which reduce the quantity of non-welcome email that gets as far as DSPAM:
- ClamAV - mailserver based Virus scanner
- SPF. Much spam comes from forged addresses - SPF checks for forgeries and blocks them
Both courtesy of the wonderful QmailRocks Qmail installation package. Spams that would otherwise have possibly made it through don't even figure in the stats above.
Being a GoodCitizen, I also report the few spams that make it through DSPAM to SpamCop.
If I get time, I'll also set up SMTP blocking for all mail originating in China, Korea and Brazil, as that's the source of much of my spam, and in over 10 years with email, I've yet to receive any real email from those countries.




Recent comments
50 weeks 4 days ago
51 weeks 20 hours ago
1 year 3 weeks ago
1 year 3 weeks ago
1 year 11 weeks ago
1 year 15 weeks ago
1 year 16 weeks ago
1 year 18 weeks ago
1 year 21 weeks ago
1 year 22 weeks ago