OT: More on Spam Re: [Dovecot] deploying dspam

Wed Dec 15 19:11:59 EET 2004

On Wednesday 15 Dec 2004 4:48 pm, John Peacock wrote:
> Hauke Fath wrote:
> > While this of course depends on your definition of "larger", some
> > people seem to think otherwise:
> >
> > http://www.usenix.org/events/lisa04/tech/blosser.html
>
> Not having a Usenix login, I cannot comment on the full paper, but to

The full paper seems to be there under HTML (despite the 'before November 
2005' comment - whoops).

> The use of a single wordlist is appropriate for limited circumstances.
> Even in a corporate environment like I manage, there is a very wide
> definition of what constitutes spam, and a configuration such as
> described above wouldn't work here.  It would work even less in an ISP
> environment, with widely varied userbase.

Oh I don't know - we could probably easily filter our clients spam with a 
single word list - real pharmacists don't obfusicate drug names very often. 
But it would obviously lose skill, and if tuned right let more spam through. 
But that wouldn't stop it being a very effective spam filter. But I think the 
spamassassin aproach of weighing several inputs statistically is better here 
anyway - over reliance of content will always lead to false positives.

I'm interested how much Spam Assassin maintenance was complained about. I use 
to do some with SA 2, but with SA3 with network tests switched on, it seems 
to just work pretty much. Although the damn thing has started autolearning as 
ham one type of spam (argh) in the last week. 

However delegating this to users may create it's own form of maintenance :(

I wouldn't have thought that different database backends dbm versus Postgres 
would affect scalability (other than the NFS issue). As presumably if each 
user has a unique list we need to read the relevant words for each message 
from whichever database. I could see the NFS thing being a practical issue, 
but I dare say there are ways. Certainly we had a busy webserver with several 
GDBM writes happening on a web server for every hit, and my predecessors 
hadn't noticed it was opening the databases every time instead of holding 
them open between requests.