[Dovecot] dovecot dspam plugin using libdspam
Hi,
I've found the dovecot dspam plugin and looked at the code. I forks and calls the dspam binary for every mail. I didn't like this behavior, so I've migrated it to use libdspam.
The plugin still needs more love:
- Use cmake instead of a Makefile
- Make the spam folder configurable in the dovecot.conf
- Code cleanup and more comments.
Please test. Comments and patches are welcome ;)
http://www.cynapses.org/tmp/dovecot-dspam-plugin-0.1.tar.gz
Cheers,
-- andreas
-- http://www.cynapses.org/ - cybernetic synapses
Andreas,
Please, do not take this poorly. I am simply asking questions to make sure this patch/plugin is a good idea in the form you suggest.
I am a user of the other patch. I am wondering if this is worth it. Your patch, if it links against libdspam will "bloat" dovecot. What do we gain?
Not every message goes through dspam (the fork, exec, etc.). It is only those that were classified incorrectly. I agree with many of your suggested changes.
Additionally, most open source projects seem to use autoconf/automake. What do we gain by switching to cmake instead of making it work some how with dovecots autoconf/automake system?
Depending on your answers, I will try your patch and help you clean it up.
Trever Adams
Andreas Schneider wrote:
Hi,
I've found the dovecot dspam plugin and looked at the code. I forks and calls the dspam binary for every mail. I didn't like this behavior, so I've migrated it to use libdspam.
The plugin still needs more love:
- Use cmake instead of a Makefile
- Make the spam folder configurable in the dovecot.conf
- Code cleanup and more comments.
Please test. Comments and patches are welcome ;)
http://www.cynapses.org/tmp/dovecot-dspam-plugin-0.1.tar.gz
Cheers,
-- andreas
Trever L. Adams wrote:
Andreas,
Hi Trever,
Please, do not take this poorly. I am simply asking questions to make sure this patch/plugin is a good idea in the form you suggest.
I am a user of the other patch. I am wondering if this is worth it. Your patch, if it links against libdspam will "bloat" dovecot. What do we gain?
it will not really bloat it the libdspam is really small.
Not every message goes through dspam (the fork, exec, etc.). It is only those that were classified incorrectly. I agree with many of your suggested changes.
Not to fork, exec is a speed improvement and I think if you have many users using the feature at the same time you will notice it.
We read the mails, check the spam header, read the dpsam signature and reclassify it using the signature:
-- snip -- /* Attach the signature to the context */ if (_ds_set_signature(ctx, ctx->signature, signature)) { syslog(LOG_ERR, "_ds_set_signature failed!"); return -1; }
/* Call DSPAM */ if (dspam_process(ctx, NULL) != 0) { syslog(LOG_ERR, "dspam_process failed"); return -1; } -- snip --
with libdspam you can pass simply the whole message.
-- snip -- /* Call DSPAM */ if (dspam_process(ctx, message) != 0) { syslog(LOG_ERR, "dspam_process failed"); return -1; } -- snip --
Additionally, most open source projects seem to use autoconf/automake. What do we gain by switching to cmake instead of making it work some how with dovecots autoconf/automake system?
I hate autofools. CMake is much easier. So it is simply easier for me.
I'm fine if *you* do the autofools part ;)
Depending on your answers, I will try your patch and help you clean it up.
git clone git://git.cynapses.org/dovecot-dspam-plugin.git
dovecot-dpsam-plugin
Trever Adams
-- andreas
-- http://www.cynapses.org/ - cybernetic synapses
On Thu, 2007-08-30 at 15:09 +0200, Andreas Schneider wrote:
with libdspam you can pass simply the whole message.
-- snip -- /* Call DSPAM */ if (dspam_process(ctx, message) != 0) { syslog(LOG_ERR, "dspam_process failed"); return -1; } -- snip --
Note that passing dspam the signature is likely more efficient. There are two possibilities: (1) dspam extracts the signature -> dovecot is more efficient at extracting headers because of cache (2) dspam uses the mail -> it has to re-tokenize etc which afaik it doesn't if you give it the signature and it loads things from disk
johannes
Johannes Berg wrote:
Note that passing dspam the signature is likely more efficient. There are two possibilities: (1) dspam extracts the signature -> dovecot is more efficient at extracting headers because of cache (2) dspam uses the mail -> it has to re-tokenize etc which afaik it doesn't if you give it the signature and it loads things from disk
johannes
This is correct, the signature is attached, so to say, to a already tokenized version of the message. This is a large part of the overhead of dspam. Also, to do a retrain, you need a pristine message. So if a signature has been attached or any headers added in any way since it was processed by dspam it won't be a true retrain.
We want to use signatures if it is present, if not, then we can use the raw message. I would suggest the code do an if on the present of the signature. I haven't yet looked a the code. Maybe tomorrow. (I am a bit behind on my schedule.)
Trever
On Fri, 2007-08-31 at 00:57 -0600, Trever L. Adams wrote:
This is correct, the signature is attached, so to say, to a already tokenized version of the message. This is a large part of the overhead of dspam. Also, to do a retrain, you need a pristine message. So if a signature has been attached or any headers added in any way since it was processed by dspam it won't be a true retrain.
Yes, however, there's practically a guarantee that a mail has a signature; not sure whether dspam extracts it but I think it should. But since we're talking mail servers here you pretty much have control over where to put the signature, hence putting it into the header and having dovecot parse it out is easiest.
We want to use signatures if it is present, if not, then we can use the raw message. I would suggest the code do an if on the present of the signature. I haven't yet looked a the code. Maybe tomorrow. (I am a bit behind on my schedule.)
I don't think we can get a pristine message to really retrain instead of telling dspam it made an error. Hence, I originally simply disallowed retraining messages without a signature. Practically never happens unless I had to turn off the spam filters for a while for whatever reason.
johannes
We want to use signatures if it is present, if not, then we can use the raw message. I would suggest the code do an if on the present of the signature. I haven't yet looked a the code. Maybe tomorrow. (I am a bit behind on my schedule.)
Ok, I've added now support to set the spam folder and the trash folder in the dovecot configuration file.
git clone git://git.cynapses.org/dovecot-dspam-plugin.git
dovecot-dpsam-plugin
Trever
-- andreas
-- http://www.cynapses.org/ - cybernetic synapses
On 00:34:23 2007-09-07 Andreas Schneider mail@cynapses.org wrote:
We want to use signatures if it is present, if not, then we can use the raw message. I would suggest the code do an if on the present of the signature. I haven't yet looked a the code. Maybe tomorrow. (I am a bit behind on my schedule.)
Ok, I've added now support to set the spam folder and the trash folder in the dovecot configuration file.
git clone git://git.cynapses.org/dovecot-dspam-plugin.git
dovecot-dpsam-plugin
Would it be possible to specify multiple spam folders?
i.e. spam, Spam, SPAM, junk, Junk, etc... ?
That would be very usefull...
-- Andraž "ruskie" Levstik Source Mage GNU/Linux Games grimoire guru Geek/Hacker/Tinker
Hacker FAQ: http://www.plethora.net/%7eseebs/faqs/hacker.html Be sure brain is in gear before engaging mouth.
Key id = F4C1F89C Key fingerprint = 6FF2 8F20 4C9D DB36 B5B6 F134 884D 72CC F4C1 F89C
Andraž 'ruskie' Levstik wrote:
Would it be possible to specify multiple spam folders?
i.e. spam, Spam, SPAM, junk, Junk, etc... ?
That would be very usefull...
Yes, this would be possible, but first the current code should be tested ;)
Thanks for the suggestion.
-- andreas
-- http://www.cynapses.org/ - cybernetic synapses
On 13:10:54 2007-09-07 Andreas Schneider mail@cynapses.org wrote:
Andraž 'ruskie' Levstik wrote:
Would it be possible to specify multiple spam folders?
i.e. spam, Spam, SPAM, junk, Junk, etc... ?
That would be very usefull...
Yes, this would be possible, but first the current code should be tested ;)
Thanks for the suggestion.
I'll try to test it when I get the time.. Will it work on all dspam setups or does it need mysql/pgsql config...?
I'm using sqlite based setup...
-- Andraž "ruskie" Levstik Source Mage GNU/Linux Games grimoire guru Geek/Hacker/Tinker
Hacker FAQ: http://www.plethora.net/%7eseebs/faqs/hacker.html Be sure brain is in gear before engaging mouth.
Key id = F4C1F89C Key fingerprint = 6FF2 8F20 4C9D DB36 B5B6 F134 884D 72CC F4C1 F89C
Andreas Schneider wrote:
Andraž 'ruskie' Levstik wrote:
Would it be possible to specify multiple spam folders?
i.e. spam, Spam, SPAM, junk, Junk, etc... ?
That would be very usefull...
Yes, this would be possible, but first the current code should be tested ;)
Thanks for the suggestion.
Just out of curiosity, how would having multiple spam folders be useful? Or do you mean a kind of "mirroring" of the spam folder with different language variations? Like Junk is not used by Thunderbird with localization support, it creates a different name for the Junk folder depending on the language it is set up for.
Marcin.
On 13:56:24 2007-09-07 Marcin Michal Jessa lists@yazzy.org wrote:
Andreas Schneider wrote:
Andraž 'ruskie' Levstik wrote:
Would it be possible to specify multiple spam folders?
i.e. spam, Spam, SPAM, junk, Junk, etc... ?
That would be very usefull...
Yes, this would be possible, but first the current code should be tested ;)
Thanks for the suggestion.
Just out of curiosity, how would having multiple spam folders be useful? Or do you mean a kind of "mirroring" of the spam folder with different language variations? Like Junk is not used by Thunderbird with localization support, it creates a different name for the Junk folder depending on the language it is set up for.
For one thing users would be left to their own naming....
I.e. some would name it Junk some spam some bulk others something else again... The way I see it it would all behave like the same folder just make it interchangeable... Could even add the user the ability to designate a folder for spam by $var in a db or some such...
Same for trash...
I prefer user choice to something being predefined for all the same...
-- Andraž "ruskie" Levstik Source Mage GNU/Linux Games grimoire guru Geek/Hacker/Tinker
Hacker FAQ: http://www.plethora.net/%7eseebs/faqs/hacker.html Be sure brain is in gear before engaging mouth.
Key id = F4C1F89C Key fingerprint = 6FF2 8F20 4C9D DB36 B5B6 F134 884D 72CC F4C1 F89C
Andraž 'ruskie' Levstik wrote:
On 13:56:24 2007-09-07 Marcin Michal Jessa lists@yazzy.org wrote:
Andreas Schneider wrote:
Andraž 'ruskie' Levstik wrote:
Would it be possible to specify multiple spam folders?
i.e. spam, Spam, SPAM, junk, Junk, etc... ?
That would be very usefull...
Yes, this would be possible, but first the current code should be tested ;)
Thanks for the suggestion.
Just out of curiosity, how would having multiple spam folders be useful? Or do you mean a kind of "mirroring" of the spam folder with different language variations? Like Junk is not used by Thunderbird with localization support, it creates a different name for the Junk folder depending on the language it is set up for.
For one thing users would be left to their own naming....
I.e. some would name it Junk some spam some bulk others something else again... The way I see it it would all behave like the same folder just make it interchangeable... Could even add the user the ability to designate a folder for spam by $var in a db or some such...
Same for trash...
I prefer user choice to something being predefined for all the same...
I agree, that would be a neat thing. But how would you be able to cover all the folder names users can come up with? And what if user creates two folders, one called Junk and one called Spam? Would both show the same thing? Having a web interface only lets you make some certain choices for the user and disallow her performing certain operations. With a normal MUA you can't really do that. Users will be creating all kinds of different folder names expecting them to work.
Marcin.
On 14:33:07 2007-09-07 Marcin Michal Jessa lists@yazzy.org wrote:
Andraž 'ruskie' Levstik wrote:
On 13:56:24 2007-09-07 Marcin Michal Jessa lists@yazzy.org wrote:
Andreas Schneider wrote:
Andraž 'ruskie' Levstik wrote:
Would it be possible to specify multiple spam folders?
i.e. spam, Spam, SPAM, junk, Junk, etc... ?
That would be very usefull...
Yes, this would be possible, but first the current code should be tested ;)
Thanks for the suggestion.
Just out of curiosity, how would having multiple spam folders be useful? Or do you mean a kind of "mirroring" of the spam folder with different language variations? Like Junk is not used by Thunderbird with localization support, it creates a different name for the Junk folder depending on the language it is set up for.
For one thing users would be left to their own naming....
I.e. some would name it Junk some spam some bulk others something else again... The way I see it it would all behave like the same folder just make it interchangeable... Could even add the user the ability to designate a folder for spam by $var in a db or some such...
Same for trash...
I prefer user choice to something being predefined for all the same...
I agree, that would be a neat thing. But how would you be able to cover all the folder names users can come up with? And what if user creates two folders, one called Junk and one called Spam? Would both show the same thing? Having a web interface only lets you make some certain choices for the user and disallow her performing certain operations. With a normal MUA you can't really do that. Users will be creating all kinds of different folder names expecting them to work.
Hence why the plugin would allow a list of names... and they would all work... Either by having aliases to the other folders or simply doing if something happens with folder $one_of_folders do this...
-- Andraž "ruskie" Levstik Source Mage GNU/Linux Games grimoire guru Geek/Hacker/Tinker
Hacker FAQ: http://www.plethora.net/%7eseebs/faqs/hacker.html Be sure brain is in gear before engaging mouth.
Key id = F4C1F89C Key fingerprint = 6FF2 8F20 4C9D DB36 B5B6 F134 884D 72CC F4C1 F89C
On Fri, 2007-09-07 at 15:19 +0200, "Andraž 'ruskie' Levstik" wrote:
Hence why the plugin would allow a list of names... and they would all work... Either by having aliases to the other folders or simply doing if something happens with folder $one_of_folders do this...
No, you didn't get the point. What if the user has Junk and SPAM and both are "special", what e.g. if the user moves from SPAM to Junk? Which of the two folders did the user intend for training? Surely not both because then they wouldn't have created both.
johannes
Marcin Michal Jessa escribió:
Andreas Schneider wrote:
Andraž 'ruskie' Levstik wrote:
Would it be possible to specify multiple spam folders?
i.e. spam, Spam, SPAM, junk, Junk, etc... ?
That would be very usefull...
Yes, this would be possible, but first the current code should be tested ;)
Thanks for the suggestion.
Just out of curiosity, how would having multiple spam folders be useful? Or do you mean a kind of "mirroring" of the spam folder with different language variations? Like Junk is not used by Thunderbird with localization support, it creates a different name for the Junk folder depending on the language it is set up for.
nope. thunderbird uses always Junk as it's spam folder, then the UI shows it as localized to the user (same applies to the other 'special' folders Sent, Trash, ...)
-- Angel Marin http://anmar.eu.org/
It's none of my business, but here's a suggestion for more work for somebody. :-)
The idea behind the dspam plugin -- watching things move from folder to folder -- seems very, very clever to me. I wonder if it might have wider applicability, even if we can't see what it is today? Taking account some of the performance concerns discussed way, way back, it still seems like a good idea to separate out the watching part from the processing part.
What if there were a generic "movewatcher" plugin which:
-- observed movements from one folder to another
-- configured with some kind of list or regexp or something for the source folders which were interesting/uninteresting (e.g., "SPAM") and likewise for the target folders (e.g., "Trash")
-- configured for some list of email header values to log
-- configured for some place/way to log those header values
-- configured for some format for logging with maybe %-escapes for interesting information (e.g., "%u" means the userid, "%s" means the source folder name, "%t" means the target folder name, "%hFoo" means the value of header "Foo", "%HFoo" means the string "Foo: " followed by the value of header "Foo").
Assuming everything interesting you would want to process is available in one or more headers (in keeping with the Postmaster Code of Condut Rule #2c: Thou shalt not look at content :-), you could log everything interesting to some flat file (or FIFO) and write a completely independent program to process that log.
For the particular case of dspam retraining, you're all good if you have the dspam signature in the headers. (I don't know how you end up without that except during transition to dspam, so I'm not worried about retraining from the raw message.)
bill-dovecot@carpenter.ORG (WJCarpenter) PGP 0x91865119 38 95 1B 69 C9 C6 3D 25 73 46 32 04 69 D6 ED F3
Andreas Schneider wrote:
Ok, I've added now support to set the spam folder and the trash folder in the dovecot configuration file.
git clone git://git.cynapses.org/dovecot-dspam-plugin.git
dovecot-dpsam-plugin
Hi,
the git address has changed.
git clone git://git.cynapses.org/gladiac/dovecot-dspam-plugin.git
-- andreas
-- http://www.cynapses.org/ - cybernetic synapses
Hi,
the git address has changed.
git clone git://git.cynapses.org/gladiac/dovecot-dspam-plugin.git
-- andreas
Hello, Andreas,
I have not had a chance to look at your plugin. I am wondering if you have updated the plugin for the API changes in 1.1 alpha series (particularly alpha6). If so, I am at a point I can test the plugin out and possibly contribute code.
Thank you for a quick response, Trever Adams
Trever L. Adams wrote:
Hello, Andreas,
I have not had a chance to look at your plugin. I am wondering if you have updated the plugin for the API changes in 1.1 alpha series (particularly alpha6). If so, I am at a point I can test the plugin out and possibly contribute code.
Hi Trever,
I haven't updated the plugin for the 1.1 API changes. I'm running 1.0.x on my server.
Feel free to send patches to get it working with 1.1 :)
Thank you for a quick response, Trever Adams
Best regards,
-- andreas
On 2007-09-20 13:42:03 +0200, Andreas Schneider wrote:
Trever L. Adams wrote:
Hello, Andreas,
I have not had a chance to look at your plugin. I am wondering if you have updated the plugin for the API changes in 1.1 alpha series (particularly alpha6). If so, I am at a point I can test the plugin out and possibly contribute code.
Hi Trever,
I haven't updated the plugin for the 1.1 API changes. I'm running 1.0.x on my server.
Feel free to send patches to get it working with 1.1 :)
that leads to the question how big those changes are and if it would be feasible to #if on some DOVECOT_VERSION define to avoid forking the plugin for 1.1.
Timo?
darix
-- openSUSE - SUSE Linux is my linux openSUSE is good for you www.opensuse.org
On Thu, 2007-09-20 at 13:48 +0200, Marcus Rueckert wrote:
Feel free to send patches to get it working with 1.1 :)
that leads to the question how big those changes are and if it would be feasible to #if on some DOVECOT_VERSION define to avoid forking the plugin for 1.1.
Hmm. Looks like there aren't any easy to #if macros available. But I've done a lot of small changes all around and it would probably be pretty dirty to make the same source work with both 1.0 and 1.1.
On Thu, 2007-09-20 at 16:45 +0300, Timo Sirainen wrote:
that leads to the question how big those changes are and if it would be feasible to #if on some DOVECOT_VERSION define to avoid forking the plugin for 1.1.
Hmm. Looks like there aren't any easy to #if macros available. But I've done a lot of small changes all around and it would probably be pretty dirty to make the same source work with both 1.0 and 1.1.
It's not too bad, I've done that now. It compiles against 1.1 but I haven't tested it, you do have to set manually which version you're compiling against though.
johannes
participants (9)
-
"Andraž 'ruskie' Levstik"
-
Andreas Schneider
-
Angel Marin
-
bill-dovecot@carpenter.ORG
-
Johannes Berg
-
Marcin Michal Jessa
-
Marcus Rueckert
-
Timo Sirainen
-
Trever L. Adams