[Dovecot] Antispam plugin / sa-learn
Hello, Sorry for posting on both list spamassassin and dovecot : my question is on dovecot antispam plugin, used to learn spamassassin with sa-learn. I wonder if there is a way to confirme sa-learn is correctly feeded by the antispam plugin. dovecot version : 2.1.7 spamassassin version : 3.3.2 (both packaged in debian stable, with postfix and amavis) i configured dovecot's antispam plugin this way : plugin { ... #Antispam antispam_debug_target = syslog antispam_verbose_debug = 1 antispam_backend = pipe antispam_trash = Trash antispam_spam = Junk antispam_allow_append_to_spam = no antispam_pipe_program = /srv/datadisk01/bin/sa-learn-pipe.sh antispam_pipe_program_spam_arg = --spam antispam_pipe_program_notspam_arg = --ham } refering to : http://wiki2.dovecot.org/Plugins/Antispam using that script to pipe message to sa-learn : #!/bin/sh echo /usr/bin/sa-learn $* /tmp/sendmail-msg-$$.txt ; echo "$$-start ($*)" >> /tmp/sa-learn-pipe.log ; #echo $* > /tmp/sendmail-parms.txt ; cat<&0 >> /tmp/sendmail-msg-$$.txt ; /usr/bin/sa-learn $* /tmp/sendmail-msg-$$.txt ; rm -f /tmp/sendmail-msg-$$.txt ; echo "$$-end" >> /tmp/sa-learn-pipe.log ; exit 0; here is what i got when i move a mail to Junk folder : Sep 11 18:10:10 effraie01 imap: antispam: plugin initialising (2.0-notgit) Sep 11 18:10:10 effraie01 imap: antispam: verbose debug enabled Sep 11 18:10:10 effraie01 imap: antispam: "Junk" is exact match spam folder Sep 11 18:10:10 effraie01 imap: antispam: no unsure folders Sep 11 18:10:10 effraie01 imap: antispam: "Trash" is exact match trash folder Sep 11 18:10:10 effraie01 imap: antispam: pipe backend spam argument = --spam Sep 11 18:10:10 effraie01 imap: antispam: pipe backend not-spam argument = --ham Sep 11 18:10:10 effraie01 imap: antispam: pipe backend program = /srv/datadisk01/bin/sa-learn-pipe.sh Sep 11 18:10:10 effraie01 imap: antispam: pipe backend tmpdir /tmp Sep 11 18:11:10 effraie01 imap: antispam: plugin initialising (2.0-notgit) Sep 11 18:11:10 effraie01 imap: antispam: verbose debug enabled Sep 11 18:11:10 effraie01 imap: antispam: "Junk" is exact match spam folder Sep 11 18:11:10 effraie01 imap: antispam: no unsure folders Sep 11 18:11:10 effraie01 imap: antispam: "Trash" is exact match trash folder Sep 11 18:11:10 effraie01 imap: antispam: pipe backend spam argument = --spam Sep 11 18:11:10 effraie01 imap: antispam: pipe backend not-spam argument = --ham Sep 11 18:11:10 effraie01 imap: antispam: pipe backend program = /srv/datadisk01/bin/sa-learn-pipe.sh Sep 11 18:11:10 effraie01 imap: antispam: pipe backend tmpdir /tmp Sep 11 18:12:04 effraie01 imap: antispam: mailbox_is_unsure(Junk): 0 Sep 11 18:12:04 effraie01 imap: antispam: mailbox_is_trash(INBOX): 0 Sep 11 18:12:04 effraie01 imap: antispam: mailbox_is_trash(Junk): 0 Sep 11 18:12:04 effraie01 imap: antispam: mail copy: from trash: 0, to trash: 0 Sep 11 18:12:04 effraie01 imap: antispam: mailbox_is_spam(INBOX): 0 Sep 11 18:12:04 effraie01 imap: antispam: mailbox_is_spam(Junk): 1 Sep 11 18:12:04 effraie01 imap: antispam: mailbox_is_unsure(INBOX): 0 Sep 11 18:12:04 effraie01 imap: antispam: mail copy: src spam: 0, dst spam: 1, src unsure: 0 Sep 11 18:12:04 effraie01 imap: antispam: running mailtrain backend program /srv/datadisk01/bin/sa-learn-pipe.sh Sep 11 18:12:04 effraie01 imap: antispam: running mailtrain backend program /srv/datadisk01/bin/sa-learn-pipe.sh Sep 11 18:12:04 effraie01 imap: antispam: running mailtrain backend program parameter 1 --spam and here is what i got in /tmp/sa-learn-pipe.log: 10545-start (--spam) 10545-end For me, it's working, but when i run sa-learn --backup, i just get this : v 3 db_version # this must be the first line!!! v 0 num_spam v 0 num_nonspam it's probably cause i'm using ***STANDARD-ANTI-UBE-TEST-EMAIL*** wich probably teach nothing to sa-learn, but i wonder if i can find somewher a log or something confirming sa-learn correctly get the email i pipe to it. thanks a lot in advance -- Mathieu
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, 11 Sep 2013, Mathieu R. wrote:
Sorry for posting on both list spamassassin and dovecot : my question is on dovecot antispam plugin, used to learn spamassassin with sa-learn.
I wonder if there is a way to confirme sa-learn is correctly feeded by the antispam plugin.
using that script to pipe message to sa-learn :
#!/bin/sh echo /usr/bin/sa-learn $* /tmp/sendmail-msg-$$.txt ; echo "$$-start ($*)" >> /tmp/sa-learn-pipe.log ; #echo $* > /tmp/sendmail-parms.txt ; cat<&0 >> /tmp/sendmail-msg-$$.txt ; /usr/bin/sa-learn $* /tmp/sendmail-msg-$$.txt ; rm -f /tmp/sendmail-msg-$$.txt ; echo "$$-end" >> /tmp/sa-learn-pipe.log ; exit 0;
above script is missing important log information: the current uid and $HOME; also sa-learn knows -D
I would change for a testing period: #!/bin/sh echo /usr/bin/sa-learn $* /tmp/sendmail-msg-$$.txt ; echo "$$-start ($*)" >> /tmp/sa-learn-pipe.log ; #echo $* > /tmp/sendmail-parms.txt ; cat<&0 >> /tmp/sendmail-msg-$$.txt ; /usr/bin/sa-learn -D $* /tmp/sendmail-msg-$$.txt >/tmp/sa-learn-pipe.$$.tmp 2>&1; echo $$ sa-learn rc=$? id=$(id) HOME=$HOME >> /tmp/sa-learn-pipe.log while read line; do echo $$-sa-learn "$line" >> /tmp/sa-learn-pipe.log done < /tmp/sa-learn-pipe.$$.tmp rm -f /tmp/sendmail-msg-$$.txt /tmp/sa-learn-pipe.$$.tmp echo "$$-end" >> /tmp/sa-learn-pipe.log ; exit 0;
For me, it's working, but when i run sa-learn --backup, i just get this :
v 3 db_version # this must be the first line!!! v 0 num_spam v 0 num_nonspam
Read man sa-learn section MIGRATION: "Note that if you have individual user databases you will have to perform a similar procedure for each one of them."
sa-learn --backup > backup.txt
backups the database of one particular user, I assume you use root to issue the command? But is the antispam learning script above runs as root, too?
I assume you need some --username=username and/or --prefspath=file setting.
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux)
iQEVAwUBUjFfvF3r2wJMiz2NAQIzIwgAt3414bPm+imJkaocSJRgfveJGCDnSnKB hRZNbXuA9qpQwOUpoKSwoUTi4oXoS/Jh0mhHZkumKLp6NXNym99IhezWjmw54vV4 nwWQ8ZJI6JCeR2y6i+/QHmQipUH1/8sYez3ouFyR+8kBck6ZkywPnntB/FiiOwY0 uLRPErefGQ2xQdkN5L4nTeCVcS4IarEL9W6pUQKhA9wgBhqNzf8ocM1riwauuWMr Y6YSagSnOx/89q1/XCpb8YMO+ZDYY4cbMPVR9AlHW1XwT7f0QWY/8Ztjo9fC3m0K HTC4+NRkiFBp1ept6Qs5Itb0z9n//lz7V2bXgThcWloTmcOScqZ2kQ== =blum -----END PGP SIGNATURE-----
Le 12/09/2013 08:31, Steffen Kaiser a écrit :
above script is missing important log information: the current uid and $HOME; also sa-learn knows -D
I would change for a testing period: #!/bin/sh echo /usr/bin/sa-learn $* /tmp/sendmail-msg-$$.txt ; echo "$$-start ($*)" >> /tmp/sa-learn-pipe.log ; #echo $* > /tmp/sendmail-parms.txt ; cat<&0 >> /tmp/sendmail-msg-$$.txt ; /usr/bin/sa-learn -D $* /tmp/sendmail-msg-$$.txt
/tmp/sa-learn-pipe.$$.tmp 2>&1; echo $$ sa-learn rc=$? id=$(id) HOME=$HOME >> /tmp/sa-learn-pipe.log while read line; do echo $$-sa-learn "$line" >> /tmp/sa-learn-pipe.log done < /tmp/sa-learn-pipe.$$.tmp rm -f /tmp/sendmail-msg-$$.txt /tmp/sa-learn-pipe.$$.tmp echo "$$-end" >> /tmp/sa-learn-pipe.log ; exit 0;
thank you a lot, i tried this, and here is what i got in the log :
22:00 root@effraie01 ~ # cat /tmp/sa-learn-pipe.log ... 4933-start (--ham) 4933 sa-learn rc=0 id=uid=3000(vmail) gid=3000(vmail) groups=3000(vmail) HOME= 4933-end 4953-start (--spam) 4953 sa-learn rc=0 id=uid=3000(vmail) gid=3000(vmail) groups=3000(vmail) HOME= 4953-end
so i tried
22:01 root@effraie01 ~ # sa-learn --username=vmail --backup v 3 db_version # this must be the first line!!! v 0 num_spam v 0 num_nonspam
Read man sa-learn section MIGRATION: "Note that if you have individual user databases you will have to perform a similar procedure for each one of them."
sa-learn --backup > backup.txt
backups the database of one particular user, I assume you use root to issue the command? But is the antispam learning script above runs as root, too?
if i correctly understood what you told me, the sa-learn-pipe (and so sa-learn itself) run as vmail, wich is the global user i use for email. and there is still nothing in sa-learn database. (i dod not have many spam on that server, but still have passed a few to sa-learn via that dovecot-antispam plugin). Maybe everything is normal, but with my low level spamassassin/dovecot comprehension, i think i would have something in sa-learn db.
-- Mathieu R.
On Thu, 12 Sep 2013 22:09:42 +0200 Mathieu R. wrote:
22:01 root@effraie01 ~ # sa-learn --username=vmail --backup v 3 db_version # this must be the first line!!! v 0 num_spam v 0 num_nonspam
sa-learn --username sets the virtual user not the unix user. (BTW sa-learn --dump magic is a quicker way of reading the metadata)
By default SA stores the bayes database files under a user's home directory. If you run sa-learn as vmail, which doesn't have a home directory, it will probably just give up.
What you need to do is set bayes_path (in local.cf) to a directory to which vmail has access, then run sa-learn as vmail. Alternately you can setup one of the SQL backends.
I'm posting this through gmane as I'm not subscribed to dovecot list. I replied in that SpamAssassin list before spotting that it was cross-posted. You can ignore most of it now, but I'll quote what I wrote about learning ham:
"I'm sceptical that the Antispam plugin can learn enough ham this way. As I understand it the only mail that gets learnt as ham will be false-positives based on the overall spamassassin score, irrespective of the Bayes result. Bayes needs (by default) 200 spams and hams to even start classifying and much more for optimal results - I don't expect to get 200 FPs in the rest of my life. Unless this is high volume server with a shared database, I'd suggest either learning a few thousand hams manually, or implementing an unsure folder. You can also mitigate the problem by autotraining with a high ham threshold, but then you really need to be careful to move all spam to the spam folder. "
Le 13/09/2013 17:29, RW a écrit :
On Thu, 12 Sep 2013 22:09:42 +0200 Mathieu R. wrote:
22:01 root@effraie01 ~ # sa-learn --username=vmail --backup v 3 db_version # this must be the first line!!! v 0 num_spam v 0 num_nonspam
sa-learn --username sets the virtual user not the unix user. (BTW sa-learn --dump magic is a quicker way of reading the metadata)
By default SA stores the bayes database files under a user's home directory. If you run sa-learn as vmail, which doesn't have a home directory, it will probably just give up.
What you need to do is set bayes_path (in local.cf) to a directory to which vmail has access, then run sa-learn as vmail. Alternately you can setup one of the SQL backends.
Setting bayes_path made it fall in work, thank a lot!
I'm posting this through gmane as I'm not subscribed to dovecot list. I replied in that SpamAssassin list before spotting that it was cross-posted. You can ignore most of it now, but I'll quote what I wrote about learning ham:
"I'm sceptical that the Antispam plugin can learn enough ham this way. As I understand it the only mail that gets learnt as ham will be false-positives based on the overall spamassassin score, irrespective of the Bayes result. Bayes needs (by default) 200 spams and hams to even start classifying and much more for optimal results - I don't expect to get 200 FPs in the rest of my life. Unless this is high volume server with a shared database, I'd suggest either learning a few thousand hams manually, or implementing an unsure folder. You can also mitigate the problem by autotraining with a high ham threshold, but then you really need to be careful to move all spam to the spam folder. "
as my english is approximative, i'm not sure to really understand what you mean :
should i :
- do not use antispam-plugin to learn spam, but do it manually with sa-learn /path/to/ham ?
- do not use antispam-plugin at alla ?
- use antispam-plugin to learn ham, but still do it by hand with
sa-learn /path/to/ham ? - take care for something else ?
-- Mathieu R.
participants (3)
-
Mathieu R.
-
RW
-
Steffen Kaiser