Re: [Dovecot] sa learning from an imap spam folder
This is my approach, stored in /etc/cron.hourly. It's very new, so I'm still testing it.
The goal is to learn HAM massages only if they are a day old, so that I can manually remove SPAM that slipped through. Mails tagged from Spamassassin are sorted automatically into the "Junk.Spam" mailbox and learned after 12 hours. I manually move any missed SPAM into the .Junk mailbox, so I'm sure there are no wrongly tagged messages there, so I can learn those messages as soon as they are found by cron.
/etc/cron.hourly/sa-learn:
#!/bin/sh
umask 022
# Learn HAM messages which were roughly received between 24 and 25 hours
ago
find
/var/vmail/fkware.de/frank.kintrup/Maildir/
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Junk*' -prune -o
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Sent' -prune -o
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Trash' -prune -o
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Draft' -prune -o
-iname '*server.fkware.de*' -type f -mmin +1435 -mmin -1505
-execdir sa-learn --username=vmail --no-sync --ham {} \;
/dev/nul 2>/dev/nul
# Learn and delete SPAM messages which were manually moved to the Junk
folder
find
/var/vmail/fkware.de/frank.kintrup/Maildir/.Junk/
-iname '*server.fkware.de*' -type f
-execdir sa-learn --username=vmail --no-sync --spam {} \;
-execdir rm {} \; \
/dev/nul 2>/dev/nul
# Learn and delete SPAM messages which were received more than 12 hours ago
# and automatically put into the Junk.Spam folder
find
/var/vmail/fkware.de/frank.kintrup/Maildir/.Junk.Spam/
-iname '*server.fkware.de*' -type f -mmin +720
-execdir sa-learn --username=vmail --no-sync --spam {} \;
-execdir rm {} \; \
/dev/nul 2>/dev/nul
sa-learn --username=vmail --sync >/dev/nul
exit 0
frank.kintrup@fkware.de wrote:
This is my approach, stored in /etc/cron.hourly. It's very new, so I'm still testing it.
The goal is to learn HAM massages only if they are a day old, so that I can manually remove SPAM that slipped through. Mails tagged from Spamassassin are sorted automatically into the "Junk.Spam" mailbox and learned after 12 hours. I manually move any missed SPAM into the .Junk mailbox, so I'm sure there are no wrongly tagged messages there,
what happens if you don't read your mail (vacation, ... etc)?
so I can learn those messages as soon as they are found by cron.
Consider training on errors only.
I personally use 3 folders:
Junk: tagged spam goes here Junk/Trash: confirmed or missed spam goes here (manually). goes to sa-learn --spam. Junk/Error: false positives go here (manually of course:). goes to sa-learn --ham
/etc/cron.hourly/sa-learn:
#!/bin/sh
umask 022
# Learn HAM messages which were roughly received between 24 and 25 hours ago find
/var/vmail/fkware.de/frank.kintrup/Maildir/
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Junk*' -prune -o
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Sent' -prune -o
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Trash' -prune -o
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Draft' -prune -o
-iname '*server.fkware.de*' -type f -mmin +1435 -mmin -1505
-execdir sa-learn --username=vmail --no-sync --ham {} \;
/dev/nul 2>/dev/nul# Learn and delete SPAM messages which were manually moved to the Junk folder find
/var/vmail/fkware.de/frank.kintrup/Maildir/.Junk/
-iname '*server.fkware.de*' -type f
-execdir sa-learn --username=vmail --no-sync --spam {} \;
-execdir rm {} \; \/dev/nul 2>/dev/nul
# Learn and delete SPAM messages which were received more than 12 hours ago # and automatically put into the Junk.Spam folder find
/var/vmail/fkware.de/frank.kintrup/Maildir/.Junk.Spam/
-iname '*server.fkware.de*' -type f -mmin +720
-execdir sa-learn --username=vmail --no-sync --spam {} \;
-execdir rm {} \; \/dev/nul 2>/dev/nul
sa-learn --username=vmail --sync >/dev/nul
exit 0
Dear Frank,
2008/1/29, frank.kintrup@fkware.de frank.kintrup@fkware.de:
This is my approach, stored in /etc/cron.hourly. It's very new, so I'm still testing it.
Thank you very much! This was a real help. Just one issue. When I set it as a cron job, it said: sa-learn: not found
I understand it is not producing such an effect in your case? Thank you!
Zbigniew Szalbot
The goal is to learn HAM massages only if they are a day old, so that I can manually remove SPAM that slipped through. Mails tagged from Spamassassin are sorted automatically into the "Junk.Spam" mailbox and learned after 12 hours. I manually move any missed SPAM into the .Junk mailbox, so I'm sure there are no wrongly tagged messages there, so I can learn those messages as soon as they are found by cron.
/etc/cron.hourly/sa-learn:
#!/bin/sh
umask 022
# Learn HAM messages which were roughly received between 24 and 25 hours ago find
/var/vmail/fkware.de/frank.kintrup/Maildir/
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Junk*' -prune -o
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Sent' -prune -o
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Trash' -prune -o
-path '/var/vmail/fkware.de/frank.kintrup/Maildir/.Draft' -prune -o
-iname '*server.fkware.de*' -type f -mmin +1435 -mmin -1505
-execdir sa-learn --username=vmail --no-sync --ham {} \;
/dev/nul 2>/dev/nul# Learn and delete SPAM messages which were manually moved to the Junk folder find
/var/vmail/fkware.de/frank.kintrup/Maildir/.Junk/
-iname '*server.fkware.de*' -type f
-execdir sa-learn --username=vmail --no-sync --spam {} \;
-execdir rm {} \; \/dev/nul 2>/dev/nul
# Learn and delete SPAM messages which were received more than 12 hours ago # and automatically put into the Junk.Spam folder find
/var/vmail/fkware.de/frank.kintrup/Maildir/.Junk.Spam/
-iname '*server.fkware.de*' -type f -mmin +720
-execdir sa-learn --username=vmail --no-sync --spam {} \;
-execdir rm {} \; \/dev/nul 2>/dev/nul
sa-learn --username=vmail --sync >/dev/nul
exit 0
On Jan 29, 2008 2:36 PM, Zbigniew Szalbot zszalbot@gmail.com wrote:
Dear Frank,
2008/1/29, frank.kintrup@fkware.de frank.kintrup@fkware.de:
This is my approach, stored in /etc/cron.hourly. It's very new, so I'm still testing it.
Thank you very much! This was a real help. Just one issue. When I set it as a cron job, it said: sa-learn: not found
Give the full path to sa-learn (or put which sa-learn
)
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254733744121/+254722743223
"Oh My God! They killed init! You Bastards!" --from a /. post
At 3:21 PM +0300 1/29/08, Odhiambo Washington wrote:
On Jan 29, 2008 2:36 PM, Zbigniew Szalbot zszalbot@gmail.com wrote:
Dear Frank,
2008/1/29, frank.kintrup@fkware.de frank.kintrup@fkware.de:
This is my approach, stored in /etc/cron.hourly. It's very new, so I'm still testing it.
Thank you very much! This was a real help. Just one issue. When I set it as a cron job, it said: sa-learn: not found
Give the full path to sa-learn (or put
which sa-learn
)
'which' doesn't work if the executable is not in your PATH environment variable. Most flavors of cron (wisely) provide a very limited value for PATH, in some cases just "/bin:/usr/bin" and on some systems /bin is just a symlink to /usr/bin anyway...
If you are using Vixie Cron or a workalike, you can set PATH in your crontab and never have to worry about forgetting to use a full pathname. That can also help protect against running an unexpected version of something on a system where alternative implementations may lurk behind unexpected PATH's.
-- Bill Cole bill@scconsult.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Just because i am too boring to fix it correctly (move old spam to another location etc - etc), i have just done this:
crontab -l
10 */1 * * * /bin/ls -1 /var/mail/folders/ebalaskas/.spam/cur | xargs /usr/local/bin/sa-learn --spam
PS: sorry for being lazy
Evaggelos Balaskas - http://ebalaskas.gr Unix System Engineer Informatics Engineer Technological Education -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHoaioWIK+Pe9twhoRAu6CAKCsO0HeJ5dM8JU522Vufxc+itd02ACfbDey hPnCynnDXkq5oAJKm/6E3OE= =p6IE -----END PGP SIGNATURE-----
Evaggelos Balaskas wrote:
Just because i am too boring to fix it correctly (move old spam to another location etc - etc), i have just done this:
crontab -l
10 */1 * * * /bin/ls -1 /var/mail/folders/ebalaskas/.spam/cur | xargs /usr/local/bin/sa-learn --spam
PS: sorry for being lazy
but not lazy enough to remove the ls and xargs (or do you like pipe rigati :) /usr/local/bin/sa-learn --spam /var/mail/folders/ebalaskas/.spam/cur/
but it doesn't take much more to write a script:
#!/bin/sh
learn_spam="/usr/local/bin/sa-learn --spam" spam_dir=/var/mail/folders/ebalaskas/.spam corpus_dir=${spam_dir}/corpus aux_dir=${spam_dir}/tolearn
mkdir -p ${corpus_dir} mkdir -p ${aux_dir_dir} mv ${spam_dir}/cur/* ${aux_dir}/ ${learn_spam} ${aux_dir} && mv ${aux_dir}/* ${corpus_dir}
Hello,
We offer our users to learn spam/ham by using folders called 'learnham' or 'learnspam'. We currently use a cron job and use find to search through all the folders to find folders called learn-spam and then call sa-learn etc to process each message.
Is there away that dovecot can call a script when a message is moved to a particular folder? It seems to me that this would be more efficient than using find, and would get the messages learnt as soon as they are moved in to the learning folders.
Andrew.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Thu, 10 Apr 2008, Andrew Hearn wrote:
Is there away that dovecot can call a script when a message is moved to a
Look into the pipe or antispam w/ "sendmail" backend plugins.
antispam's sendmail plugin can call any program.
Bye,
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFH/xIHVJMDrex4hCIRAhIxAJ9K3h/kRKhTEMiXn9ykmu+bEcJ2lgCeNklG 6iwdWqz819cOVdx4w8CzYt4= =7w9o -----END PGP SIGNATURE-----
participants (8)
-
Andrew Hearn
-
Bill Cole
-
Evaggelos Balaskas
-
frank.kintrup@fkware.de
-
mouss
-
Odhiambo Washington
-
Steffen Kaiser
-
Zbigniew Szalbot