[Dovecot] Sieve mails with decoded subject
Dear all,
I accidently posted this message to dovecot-news and want to apologize for any trouble I may have caused. Here it is again to the correct mailing list:
Recently I'm getting a ton of porn/viagra/diet/casino spam mails which I tried to filter with sieve:
if header :contains "Subject" [ "[SPAM]", "Bett", "Schlafzimmer", "Spielen", "Luder", "abhenmen", "poppen", "Casino", "Bonus", "abnehmen", "Gewinn", "Potenz" ] { addflag "\\seen"; fileinto "Spam"; stop; }
Sadly, this didn't work - taking a closer look upon one of the spam mails, it seems the spamming rats have encoded the subject:
Subject: =?US-ASCII?B?RW5kbGljaCBtZWhyIGVuZ2FnZW1lbnQgaW0gQmV0dGNoZW4=?= (reads: "Endlich mehr engagement im Bettchen")
I'm thinking about filtering all such encoded subjects (as there's no reason to encode them US-ASCII), but suppose it were UTF-8 or something: how can I filter on the actual content, not the encoded subject? Surely someone has solved that problem already?
Kind regards, Johannes
On Thu, 2009-12-10 at 19:34 +0100, Johannes Bauer wrote:
Dear all,
I accidently posted this message to dovecot-news and want to apologize for any trouble I may have caused.
Don't worry. All mails there from non-trusted addresses simply get discarded (or rejected?)
Sadly, this didn't work - taking a closer look upon one of the spam mails, it seems the spamming rats have encoded the subject:
Subject: =?US-ASCII?B?RW5kbGljaCBtZWhyIGVuZ2FnZW1lbnQgaW0gQmV0dGNoZW4=?= (reads: "Endlich mehr engagement im Bettchen")
They're decoded at least in v1.2 with the new Sieve plugin. I don't remember if CMU Sieve decoded them, I'm guessing not. So maybe it's time for an upgrade?
Timo Sirainen schrieb:
They're decoded at least in v1.2 with the new Sieve plugin. I don't remember if CMU Sieve decoded them, I'm guessing not. So maybe it's time for an upgrade?
*sigh*, I guess it might be.
brick [~]: dovecot --version 1.0.rc15
I'm still somewhat reluctant to change to 1.2, because that would mean that I cannot rely on my distros packages anymore (Etch that is for that server) but must maintain the packages myself. Well...
Kind regards, Johannes
On Thu, 2009-12-10 at 20:22 +0100, Johannes Bauer wrote:
I'm still somewhat reluctant to change to 1.2, because that would mean that I cannot rely on my distros packages anymore (Etch that is for that server) but must maintain the packages myself. Well...
How about using backports.org?
Timo Sirainen tss@iki.fi writes:
On Thu, 2009-12-10 at 20:22 +0100, Johannes Bauer wrote:
I'm still somewhat reluctant to change to 1.2, because that would mean that I cannot rely on my distros packages anymore (Etch that is for that server) but must maintain the packages myself. Well...
How about using backports.org?
Well, for etch the backport is not enough for 1.2.x: http://packages.debian.org/etch-backports/dovecot-imapd
BTW, according to http://wiki.debian.org/DebianEtch, on Feb 2010, Etch will be declared as EOLed, so, imho, an Etch-to-Lenny upgrade should be planned very soon there.
-- Nicolas
On Qui, 10 Dez 2009, Johannes Bauer wrote:
I'm thinking about filtering all such encoded subjects (as there's no reason to encode them US-ASCII), but suppose it were UTF-8 or something: how can I filter on the actual content, not the encoded subject? Surely someone has solved that problem already?
Yes, such as the guys behind SpamAssassin, or dspam, or any of the
many spam filtering programs that exist. Actually, they make much more
complicated decisions instead of only looking for bad words in the
subject field. I'd suggest you try installing one of them.
-- Eduardo M KALINOWSKI eduardo@kalinowski.com.br
Eduardo M KALINOWSKI schrieb:
On Qui, 10 Dez 2009, Johannes Bauer wrote:
I'm thinking about filtering all such encoded subjects (as there's no reason to encode them US-ASCII), but suppose it were UTF-8 or something: how can I filter on the actual content, not the encoded subject? Surely someone has solved that problem already?
Yes, such as the guys behind SpamAssassin, or dspam, or any of the many spam filtering programs that exist. Actually, they make much more complicated decisions instead of only looking for bad words in the subject field. I'd suggest you try installing one of them.
I had SpamAssassin running once and was pretty disappointed. All those complicated rules and scoring and "smart" bayesian filtering did not work very well, although I taught it in around 50k mails right from wrong. I had both lots of false-positives and lots of false-negatives, which was kind of annoying.
However, analyzing 274 spam mails I deleted in the last 5 months I can conclude that by using that extremely simple filter list I'd catch 258 of them (that's 94%). So I'd like to stick to KISS in this case.
Kind regards, Johannes
Hi,
On Thu, 10 Dec 2009 20:28:27 +0100, Johannes Bauer wrote:
Eduardo M KALINOWSKI schrieb:
On Qui, 10 Dez 2009, Johannes Bauer wrote:
I'm thinking about filtering all such encoded subjects (as there's no reason to encode them US-ASCII), but suppose it were UTF-8 or something: how can I filter on the actual content, not the encoded subject? Surely someone has solved that problem already?
Yes, such as the guys behind SpamAssassin, or dspam, or any of the many spam filtering programs that exist. Actually, they make much more complicated decisions instead of only looking for bad words in the subject field. I'd suggest you try installing one of them.
I had SpamAssassin running once and was pretty disappointed. All those complicated rules and scoring and "smart" bayesian filtering did not work very well, although I taught it in around 50k mails right from wrong. I had both lots of false-positives and lots of false-negatives, which was kind of annoying.
However, analyzing 274 spam mails I deleted in the last 5 months I can conclude that by using that extremely simple filter list I'd catch 258 of them (that's 94%). So I'd like to stick to KISS in this case.
That must have been a configuration issue - SpamAssassin works pretty well, if configured correctly - but I admit, it's a monster (both in terms of configuration and resource usage).
You could go for bogofilter (purely Bayesian). I'm using it for years on my private mail server with very good results. I like to use the tri-state filtering, where there is not only one threshold value, but two. A certainty of a mail being spam ("bogosity") of 0.35 and below goes into my inbox, mails with a bogosity value between 0.35 and 0.65 go into Spam/Unsure, and everything above 0.65 goes directly into Spam. That way I have something like 10-20 mails per week in Spam/Unsure that are usually false negatives, rarely false positives (currently around 1000 mails per week end up in Spam). To my knowledge there has never been a false positive in Spam.
Of course initial training is necessary. For ongoing training / feedback I have set up a Spam/Learn-Spam and Spam/Learn-Ham mailbox into which I move false negatives/positives. A cron script then runs the mails found in those (maildir) mailboxes through bogofilter again, with the command line option for classifying the mail as Spam/Ham and moves them to the correct mailbox (Spam/inbox) afterwards. This works well in all MUAs, because it only requires IMAP functionality to train the filter.
The solution was inspired by a Gentoo Wiki article (http://www.gentoo-wiki.info/Bogofilter).
Patrick.
-- STAR Software (Shanghai) Co., Ltd. http://www.star-group.net/ Phone: +86 (21) 3462 7688 x 826 Fax: +86 (21) 3462 7779
PGP key E883A005 https://stshacom1.star-china.net/keys/patrick_nagel.asc Fingerprint: E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005
On Dec 10, 2009, at 11:51 PM, Patrick Nagel wrote:
Of course initial training is necessary. For ongoing training / feedback I have set up a Spam/Learn-Spam and Spam/Learn-Ham mailbox into which I move false negatives/positives.
Then again there are people like me who get mostly english mails, and a couple of legitimate finnish ones per month. And all of those finnish ones get 99% bayesian spamness from SpamAssassin so I can't rely on just that.. But anyway, SA works pretty nicely. Just today I looked through several months of my Spam mailbox and there was just one false positive (someone wanted urgent Dovecot support a month ago, seemed like a bad idea to reply to it).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 2009-12-11 13:14, Timo Sirainen wrote:
On Dec 10, 2009, at 11:51 PM, Patrick Nagel wrote:
Of course initial training is necessary. For ongoing training / feedback I have set up a Spam/Learn-Spam and Spam/Learn-Ham mailbox into which I move false negatives/positives.
Then again there are people like me who get mostly english mails, and a couple of legitimate finnish ones per month. And all of those finnish ones get 99% bayesian spamness from SpamAssassin so I can't rely on just that.. But anyway, SA works pretty nicely. Just today I looked through several months of my Spam mailbox and there was just one false positive (someone wanted urgent Dovecot support a month ago, seemed like a bad idea to reply to it).
I think if you trained a couple hundred (non-Spam) Finnish mails as ham, the Bayesian filter would work fine for you. But yes, having that kind of imbalance between two (or more) "classes" of mails certainly makes Bayesian filtering less reliable.
Patrick.
STAR Software (Shanghai) Co., Ltd. http://www.star-group.net/ Phone: +86 (21) 3462 7688 x 826 Fax: +86 (21) 3462 7779
PGP key E883A005 https://stshacom1.star-china.net/keys/patrick_nagel.asc Fingerprint: E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/
iEYEARECAAYFAksh39UACgkQ7yMg/OiDoAVSAwCgn01AXOzflATD3JMOW3lkh5H0 nskAn1In2byYL430qM9oHP7Cgyz0yj9g =yB61 -----END PGP SIGNATURE-----
Patrick Nagel wrote:
You could go for bogofilter (purely Bayesian). -- cut -- The solution was inspired by a Gentoo Wiki article (http://www.gentoo-wiki.info/Bogofilter).
If it's not just for personal use, but on a mailserver with quite some users I'd like to happily recommend ASSP. http://assp.sf.net/
It's *not* easy to setup, but after you've gone through all the options and learned about it, it is indeed the most deadly spam killer around.
It's very flexible, with a lot of options. and active development. You can do regex filtering on subject, header, body, data etc. It has a nice web GUI, but you can put several things in different configuration files if you like.
One interesting option e.g. is to do delaying only for emails which have a certain "griplist" score. Another interesting option is the "Test mode", you can run it in front of e.g. postfix and just monitor it without doing any real filtering yet.
On 12/11/2009, aja-lists (aja-lists@tni.org) wrote:
If it's not just for personal use, but on a mailserver with quite some users I'd like to happily recommend ASSP. http://assp.sf.net/
It's *not* easy to setup, but after you've gone through all the options and learned about it, it is indeed the most deadly spam killer around.
Well - the defaults work pretty well (and be sure to install the default ham/spam collections) for most situations, so its really not that hard to set up.
But yes, it has a LOT of options, so there is a definite learning curve if/when you decide to start tweaking.
And yes, it is *very* effective, especially after your ham/spam collections mature for your site...
On Sat, Dec 12, 2009 at 01:04:38AM -0500, Charles Marcus wrote:
And yes, it is *very* effective, especially after your ham/spam collections mature for your site...
Now offtopic question, but anyway. Is there some plugin for deliver to pipe message througth bogofilter/spamassassin/spambayes/... before running sieve?
spam/ham collections (mostly last one) is different for different users at real life and it's desirable that each delivery address to have an ability to use it's own filter database.
I've used maildrop for this, but there is some unnessessary things (maintaining courier userdb).
Let's assume I have bogofilter, INBOX is really $HOME/Maildir, bogofilter dir is $HOME/bogofilter and I need to check existance of bogofilter data and if it exist pipe message to bogofilter (with pointing to $HOME/bogofilter as bogofilter directory like "-d $HOME/bogofilter"). Now with some options bogofilter will add spamicity header and I can match against it with sieve.
Is there a way do something like that with deliver?
WBR Dmitri Ivanov
On 2009-12-14 23:16:14 +0300, Dmitri V. Ivanov wrote:
On Sat, Dec 12, 2009 at 01:04:38AM -0500, Charles Marcus wrote:
And yes, it is *very* effective, especially after your ham/spam collections mature for your site...
Now offtopic question, but anyway. Is there some plugin for deliver to pipe message througth bogofilter/spamassassin/spambayes/... before running sieve?
spam/ham collections (mostly last one) is different for different users at real life and it's desirable that each delivery address to have an ability to use it's own filter database.
I've used maildrop for this, but there is some unnessessary things (maintaining courier userdb).
Let's assume I have bogofilter, INBOX is really $HOME/Maildir, bogofilter dir is $HOME/bogofilter and I need to check existance of bogofilter data and if it exist pipe message to bogofilter (with pointing to $HOME/bogofilter as bogofilter directory like "-d $HOME/bogofilter"). Now with some options bogofilter will add spamicity header and I can match against it with sieve.
Is there a way do something like that with deliver?
why not do it on MTA level? those already have the hooks for it normally. no need to reinvent the wheel imho.
darix
-- openSUSE - SUSE Linux is my linux openSUSE is good for you www.opensuse.org
On Mon, Dec 14, 2009 at 09:30:54PM +0100, Marcus Rueckert wrote:
Is there a way do something like that with deliver?
why not do it on MTA level? those already have the hooks for it normally. no need to reinvent the wheel imho.
Sometimes it's virtual users using dovecot userdb or something like. MTA isn't aware of it (all it uses for virtual user is mailbox presence). But each virtual user have own wordlist for bogofilter. deliver would set some environment variables like $HOME to desirable values for such virtual user and then we have good place to call filter pointing it to virtual user home directory.
WBR Dmitri Ivanov
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Dmitri,
On 2009-12-15 04:54, Dmitri V. Ivanov wrote:
On Mon, Dec 14, 2009 at 09:30:54PM +0100, Marcus Rueckert wrote:
Is there a way do something like that with deliver?
why not do it on MTA level? those already have the hooks for it normally. no need to reinvent the wheel imho.
Sometimes it's virtual users using dovecot userdb or something like. MTA isn't aware of it (all it uses for virtual user is mailbox presence). But each virtual user have own wordlist for bogofilter. deliver would set some environment variables like $HOME to desirable values for such virtual user and then we have good place to call filter pointing it to virtual user home directory.
Indeed, that would be more straight-forward than what's currently needed (from an administrator's point of view, at least).
We have postfix configured to pipe to-be-delivered mails into a shell script that takes ${recipient} (the recipient's e-mail address) as first argument (here is the relevant part from master.cf):
spamcheck unix - n n - - pipe flags=DRhu user=maildeliver:maildeliver argv=/usr/local/libexec/spamcheck_and_deliver ${recipient}
The script then pipes the mail through /usr/bin/spamc -u 'left part of e-mail address'
(which lets spamassassin's spamd do the checking and
insertion of headers, and thanks to the -u parameter for each user a
separate bayes_journal, bayes_seen and bayes_toks file is being used).
That output then gets finally piped into deliver -d 'e-mail address'
,
which files the mails into the inbox/spam/unsure mailbox, according to a
global sieve script that checks the headers previously inserted by spamd.
My private bogofilter setup is simpler, since it's just for me, but you could do it similarly, by specifying the user's bogofilter wordlist directory (-d ...) in the shell script (I think).
Patrick.
STAR Software (Shanghai) Co., Ltd. http://www.star-group.net/ Phone: +86 (21) 3462 7688 x 826 Fax: +86 (21) 3462 7779
PGP key E883A005 https://stshacom1.star-china.net/keys/patrick_nagel.asc Fingerprint: E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/
iEYEARECAAYFAksnKqUACgkQ7yMg/OiDoAXP3QCfekl4XpYu3Za2Zxt/JVbgVOMl ZcQAn0jtmQLBqhltpuJl+jcktPx+bO6O =ejbL -----END PGP SIGNATURE-----
On Tue, Dec 15, 2009 at 02:20:23PM +0800, Patrick Nagel wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Dmitri,
On 2009-12-15 04:54, Dmitri V. Ivanov wrote:
On Mon, Dec 14, 2009 at 09:30:54PM +0100, Marcus Rueckert wrote:
Is there a way do something like that with deliver?
why not do it on MTA level? those already have the hooks for it normally. no need to reinvent the wheel imho.
Sometimes it's virtual users using dovecot userdb or something like. MTA isn't aware of it (all it uses for virtual user is mailbox presence). But each virtual user have own wordlist for bogofilter. deliver would set some environment variables like $HOME to desirable values for such virtual user and then we have good place to call filter pointing it to virtual user home directory.
Indeed, that would be more straight-forward than what's currently needed (from an administrator's point of view, at least).
We have postfix configured to pipe to-be-delivered mails into a shell script that takes ${recipient} (the recipient's e-mail address) as first argument (here is the relevant part from master.cf):
spamcheck unix - n n - - pipe flags=DRhu user=maildeliver:maildeliver argv=/usr/local/libexec/spamcheck_and_deliver ${recipient}
The script then pipes the mail through
/usr/bin/spamc -u 'left part of e-mail address'
(which lets spamassassin's spamd do the checking and insertion of headers, and thanks to the -u parameter for each user a separate bayes_journal, bayes_seen and bayes_toks file is being used).That output then gets finally piped into
deliver -d 'e-mail address'
, which files the mails into the inbox/spam/unsure mailbox, according to a global sieve script that checks the headers previously inserted by spamd.My private bogofilter setup is simpler, since it's just for me, but you could do it similarly, by specifying the user's bogofilter wordlist directory (-d ...) in the shell script (I think).
No holywar!!! I just tried to ask: "is there plugin to replace maildrop with deliver for my setup". I just thinking that plugin to call bogofilter before sieve isn't to bad idea, but there may be a stones under water (I don't know it fairly).
I dont like to use shell there (procmail too, because maildrop would set all needed environment variables from it's userdb and it's easy to write script to fill maildrop userdb up from dovecot passwd-like file). And there is an option to call deliver without '-d user' as far as I understand ($HOME is set to desired value).
Note: with debian etch situation wasn't so clear bacause maildrop was compiled to use courier authdaemon, and so construction got extra unnessessery parts.
WBR Dmitri Ivanov
On Mon, 2009-12-14 at 23:54 +0300, Dmitri V. Ivanov wrote:
why not do it on MTA level? those already have the hooks for it normally. no need to reinvent the wheel imho.
Sometimes it's virtual users using dovecot userdb or something like. MTA isn't aware of it (all it uses for virtual user is mailbox presence). But each virtual user have own wordlist for bogofilter. deliver would set some environment variables like $HOME to desirable values for such virtual user and then we have good place to call filter pointing it to virtual user home directory.
Having deliver fork new processes is kind of annoying. Wonder if all of this could be done some other way with v2.0. Maybe something similar to how post-login scripting is done..: http://dovecot.org/list/dovecot/2009-December/045139.html
On 12/10/2009 2:28 PM, Johannes Bauer wrote:
Eduardo M KALINOWSKI schrieb:
On Qui, 10 Dez 2009, Johannes Bauer wrote:
I'm thinking about filtering all such encoded subjects (as there's no reason to encode them US-ASCII), but suppose it were UTF-8 or something: how can I filter on the actual content, not the encoded subject? Surely someone has solved that problem already?
Yes, such as the guys behind SpamAssassin, or dspam, or any of the many spam filtering programs that exist. Actually, they make much more complicated decisions instead of only looking for bad words in the subject field. I'd suggest you try installing one of them.
I had SpamAssassin running once and was pretty disappointed. All those complicated rules and scoring and "smart" bayesian filtering did not work very well, although I taught it in around 50k mails right from wrong. I had both lots of false-positives and lots of false-negatives, which was kind of annoying.
However, analyzing 274 spam mails I deleted in the last 5 months I can conclude that by using that extremely simple filter list I'd catch 258 of them (that's 94%). So I'd like to stick to KISS in this case.
From what I've seen, SA has been extremely good and accurate for us. We use amavisd-new to interface, but SA is at the end of a long chain of checks.
Between the (3) HELO checks, clamav-milter, and a SPF policy daemon, we're killing ~60% of all connections at SMTP time. (I analyzed that in November, instead of 65/day hitting my inbox I would've seen 6x that amount if it wasn't for those checks. So ~80% of all spam was getting blocked at SMTP time.) If we were to pay for the Spamhaus Zen list, we could probably boost that percentage to 90%.
All of the domains we do business with get a -2 or -4 score using amavisd-new. Specific addresses get a larger negative score. I ran a few thousand spam & ham messages at the SA bayes filter, then turned it on. We tag messages with a [spam] flag at 5.0 and quarantine at 9.0. Tagged messages go to the user's Inbox, quarantined messages get sieve'd into a sub-folder in the user's mailbox.
So far (in a month), no false positives. Or at least none that people have complained were quarantined when they should not have been. I'm considering lowering the quarantine threshold next month.
It's been nice to have my Inbox back, without 65 spams/day cluttering it up. Now I might see 2-5 per day that slip through without getting tagged as borderline spam (at 5.0 or higher). Those are mostly zero-day spam that haven't made it to the URIBLs or DNSBLs yet.
I'm still debating grey-listing, Razor, DCC or paying for the Spamhaus Zen list.
Compared to another, commercial, product that we were using a few years ago, SA is very very good. Not perfect, but really does a good job of classifying things with decent accuracy.
participants (10)
-
aja-lists
-
Charles Marcus
-
Dmitri V. Ivanov
-
Eduardo M KALINOWSKI
-
Johannes Bauer
-
Marcus Rueckert
-
Nicolas KOWALSKI
-
Patrick Nagel
-
Thomas Harold
-
Timo Sirainen