zlib, mdbox and spam filtering scripts
Hi all,
I am quite happily running a dovecot setup with maildirs. This was once running courier but since the migration I have never ever looked back.
Now, with growing userbase I’m looking into using zlib compression and mdbox instead of maildir. This seems to go quite well using dsync on a testbed server. So let’s go production!
Before final decision I came across this:
- I’m using a spam learning address, where users forward their mails to train SpamAssassin.
- there is a cron script on the server, mime-stripping the attachments and feeding them into SA
- works quite well even for the still numerous POP users
So well… compression ok, I could add a gunzip into the script and go from there. But mdbox seems only accessible via dovecot. I don’t think that accessing via IMAP makes sense for a local script. So I would block myself out when using mdbox.
How are your spam/ham training routines? How do you feed SA?
thank you!
Philon
On 12/19/14 11:19 AM, Philon wrote:
Before final decision I came across this:
- I’m using a spam learning address, where users forward their mails to train SpamAssassin.
- there is a cron script on the server, mime-stripping the attachments and feeding them into SA
- works quite well even for the still numerous POP users
So well… compression ok, I could add a gunzip into the script and go from there. But mdbox seems only accessible via dovecot. I don’t think that accessing via IMAP makes sense for a local script. So I would block myself out when using mdbox.
How are your spam/ham training routines? How do you feed SA?
Let the MTA feed SA. http://gtmp.org/doku.php?id=publications:sa-postfix-en
Let the MTA feed SA. http://gtmp.org/doku.php?id=publications:sa-postfix-en
Hi Oscar, Hi dovecot-list,
I read through the docs, quite interesting read. But I’m wondering, mails are in this case only kept temporary? So re-reading spam and ham is not possible…!? It seems useful still, as it’s direct and not run via a daily cron script.
I also looked into Sven’s advice (Danke!) about doveadm fetch. I just came across this script here: https://git.mnt-tech.fr/admintools.git/raw/master/blacklist.sh. It does much more than just fetch, but for me it was a good reference for retrival of mails using doveadm.
So thanks for the hints and merry X-mas!
Philon
Philon <bytesplit@gmail.com> wrote:
I also looked into Sven’s advice (Danke!) about doveadm fetch. I just came across this script here: https://git.mnt-tech.fr/admintools.git/raw/master/blacklist.sh. It does much more than just fetch, but for me it was a good reference for retrival of mails using doveadm.
My code for my personal "salearn" script looks like this:
,----
| #!/bin/bash
|
| tempdir=mktemp -d
|
| doveadm search mailbox SPAM | while read guid uid;
| do
| tempfile=mktemp --tmpdir=$tempdir
| echo -n "$uid "
| doveadm fetch text mailbox-guid $guid uid $uid | tail -n +2 | head -n -1 > $tempfile
| done
| echo
|
| sa-learn --spam --no-sync --progress $tempdir
| sa-learn --sync
|
| rm -r $tempdir
`----
Some notes:
The dance with "| tail -n +2 | head -n -1" is to remove the leading "text:" line and the ^L (page feed) at the end doveadm inserts. (I really would like to have an option to fetch the raw source of the mail without doveadm adding, removing or reformatting anything.)
I don't use "-u '*'" here since this script runs as my user only directly on the server where the mails are stored. If you want to learn mails from multiple users you will of course need to iterate over all of them, just as the example script from mnt-tech.fr does.
Grüße, Sven.
-- Sigmentation fault. Core dumped.
participants (3)
-
Oscar del Rio
-
Philon
-
Sven Hartge