Eduardo M KALINOWSKI schrieb:
On Qui, 10 Dez 2009, Johannes Bauer wrote:
I'm thinking about filtering all such encoded subjects (as there's no reason to encode them US-ASCII), but suppose it were UTF-8 or something: how can I filter on the actual content, not the encoded subject? Surely someone has solved that problem already?
Yes, such as the guys behind SpamAssassin, or dspam, or any of the many spam filtering programs that exist. Actually, they make much more complicated decisions instead of only looking for bad words in the subject field. I'd suggest you try installing one of them.
I had SpamAssassin running once and was pretty disappointed. All those complicated rules and scoring and "smart" bayesian filtering did not work very well, although I taught it in around 50k mails right from wrong. I had both lots of false-positives and lots of false-negatives, which was kind of annoying.
However, analyzing 274 spam mails I deleted in the last 5 months I can conclude that by using that extremely simple filter list I'd catch 258 of them (that's 94%). So I'd like to stick to KISS in this case.
Kind regards, Johannes