tallison@tacocat.net wrote:
So one of the key differences is the lack of a database that you can query by user? bogofilter would probably just give each user their own wordlist or use one wordlist to join them all. But the pros/cons of that decision belong elsewhere.
No, the key difference is that bogofilter requires BDB, which is a shared access database with no resident process (as used here). Each time an e-mail is processed, the BDB database must be opened, read, updated, shutdown (even though the BDB libraries themselves remain resident). Consequently, the load on the server for 400 users is much higher than a true database like MySQL. I'm not saying that BDB is bad, but rather that as used here, it doesn't scale well at all. bogofilter also permits _either_ a shared dictionary or individual dictionary. dspam has several ways of sharing or grouping users.
I'm not sure what you mean by a reset.
That's just a byproduct of the management CGI; I have never reset the stats, so those numbers are lifetime (4 months).
Given that initial curve... Unless dspam starts with a preloaded wordlist or something else, I can't imagine it's success being significantly different at the beginning.
All statistical systems require some initial training before they become accurate; dspam is no exception. I ran for about a week with a shared corpus (actually the SpamAssassin public corpus) before reverting the users to personal training dictionaries.
After training a few thousand emails, I think they all start to approach 99.999%. But again, that's a different list.
Except that in practice, SA requires more handholding to maintain that accuracy, whereas dspam just works. I cannot speak for bogofilter, but I know that when I was still using SA, 94% accuracy was considered excellent.
But I'm to understand that dspam is still implimented as a maildrop/procmail add-in? Just like bogofilter and SpamAssassin (minus amavisd)?
The out-of-the-box installation for dspam is a command line client. The actual code is implemented as a library (which is all that the command line client calls), so any proposed integration for Dovecot would be via the library, too.
John
-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4501 Forbes Boulevard Suite H Lanham, MD 20706 301-459-3366 x.5010 fax 301-429-5748