On Tue, Jun 1, 2010 at 14:56, Frank Cusack <frank+lists/dovecot@linetwo.net> wrote:
Man oh man. You don't have much experience with mail and it sounds like you are starting from scratch. You have your work cut out for you. :)
Well, actually I do. A lot of it is way way back there with sendmail. Back then I coded sendmail.cf by hand and avoided m4. I have used Postfix previously, too, but in a very different circumstance, and even then it was several years back.
The standard way to filter spam is [all of]:
a) stop it at the border, then
For my current situation, this will be a smaller amount than I've done in the past.
b) content analysis and tagging, then
I haven't done this in the past for my own mail servers primarily because of principle (e.g. to me UCE isn't about what it said, but whether it is unsolicited ... which indirectly means what is said ... and bulk ... in summary, it's about behaviour, not about the message). But in this new situation, I have to rebalance these ideas to achieve different goals. Being principled doesn't count in this situation. A potential client isn't going to be turned down or lectured just because the ISP they use is spammer friendly. I don't really like that, but that's the way it is.
c) filtering at delivery time
per-user filters for part (b) on the server side is a dead end IMHO but there are some programs that do this and dovecot can work with them (by having folders that train the filter). modern mail clients, however, have their own spam filters now, which are by definition personalized.
anyway, part (a) and (b) have nothing to do with dovecot. the standard way to do part (c) is to have an X-Spam: header which a sieve script filters on. the -m flag to deliver is not really for spam filtering.
Since sieve looks like it will be a problem right now, until I get a solution to that, I'm seriously considering this solution. A shim program I write in C will be run from Postfix master.cf just as Dovecot deliver is now. I'd basically change the executable path to the shim program. The shim program will read the message (I assume from stdin) up to 1MB or the end of headers. If the body isn't reached by 1MB it goes into the spam folder. If the X-Spam: header is present with a sufficient probability of spam, it goes into the spam folder. Else it goes into the INBOX. Set up a command argument list to run deliver, and include -m with the folder name if this goes to the spam folder. Set up pipes, fork, and child will exec deliver with that argument list. Pipe the buffer that was read in to deliver until it is empty, then pipe any remaining stdin to deliver all as one stream. Wait for deliver to exit and capture its exit status, and exit with the same status. Postfix should then know if delivery succeeded or failed.
I may set up a web mail service later, and will see if I can do Sieve within that.