-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 2009-12-11 13:14, Timo Sirainen wrote:
On Dec 10, 2009, at 11:51 PM, Patrick Nagel wrote:
Of course initial training is necessary. For ongoing training / feedback I have set up a Spam/Learn-Spam and Spam/Learn-Ham mailbox into which I move false negatives/positives.
Then again there are people like me who get mostly english mails, and a couple of legitimate finnish ones per month. And all of those finnish ones get 99% bayesian spamness from SpamAssassin so I can't rely on just that.. But anyway, SA works pretty nicely. Just today I looked through several months of my Spam mailbox and there was just one false positive (someone wanted urgent Dovecot support a month ago, seemed like a bad idea to reply to it).
I think if you trained a couple hundred (non-Spam) Finnish mails as ham, the Bayesian filter would work fine for you. But yes, having that kind of imbalance between two (or more) "classes" of mails certainly makes Bayesian filtering less reliable.
Patrick.
STAR Software (Shanghai) Co., Ltd. http://www.star-group.net/ Phone: +86 (21) 3462 7688 x 826 Fax: +86 (21) 3462 7779
PGP key E883A005 https://stshacom1.star-china.net/keys/patrick_nagel.asc Fingerprint: E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/
iEYEARECAAYFAksh39UACgkQ7yMg/OiDoAVSAwCgn01AXOzflATD3JMOW3lkh5H0 nskAn1In2byYL430qM9oHP7Cgyz0yj9g =yB61 -----END PGP SIGNATURE-----