Re: [Dovecot] dovecot-dspam-integration
Hi Trever,
Please copy the mailing list too.
On Mon, 2007-05-07 at 09:45 -0600, Trever L. Adams wrote:
- Why on line 350 of your code do you say "do (almost) everything"? What is left out? Is it done still by dovcecot?
I don't remember. You can probably find out by comparing the code.
- "I decided that hardlinking into special folders was too much work (especially with removing the mails again!), so for now my plugin is calling the dspam client directly." What exactly was the problem? Can't the problem be as simple as:
a) unlink the message from inappropriate folder (if moving into spam, unlink from unlearn, if moving out unlink from learn) using the file name below b) link the message to the appropriate folder with the name of <MESSAGE ID>-
Why do a without seeing if the file exists, well, if you use the full message ID, because if people are following the rules with their MTA's and/or mail clients, then message ID should be unique as it must be unique on their system and it seems that all of them append the hose name at the end, making it globally unique (or most likely so...). If this is the case, then the client name. If this is in question then make it <MESSAGE ID><DATE><FROM>. That should make it globally unique.
This keeps one from having to mess with keeping lists. If people keep dragging the same message to their outbox from sent, then the MESSAGE ID may not be unique, but it will fit all of the same messages, hence I think it isn't a problem. Also, unlink will cause the os to traverse the directory, why should we do it as well as it will be a non-fatal error if it doesn't exist.
If you take my suggestions, and I am not just being stupid, please provide some example processes/cron jobs to go with your code in the top comments.
Actually, the thing we care about is the dspam signature which is unique. So that's not a problem.
The point is that you have to (a) when no files exist: move into spam: create a file <sig> with contents "spam" move out of spam: create a file <sig> with contents "notspam" (b) when <sig> file exists with contents "spam": move into spam: shouldn't happen move out of spam: unlink file (c) when <sig> file exists with contents "notspam": move into spam: unlink file move out of spam: shouldn't happen
the cron job would have to iterate through all these files and call dspam depending on the contents of the file.
And then it all has to be atomic.
It's not really all that complex, but I was too lazy to implement it.
johannes
On Mon, 2007-05-07 at 18:43 +0200, Johannes Berg wrote:
Hi Trever,
Please copy the mailing list too.
I am sorry about that.
the cron job would have to iterate through all these files and call dspam depending on the contents of the file.
And then it all has to be atomic.
It's not really all that complex, but I was too lazy to implement it.
johannes
I understand. I have a few questions about your code. You do a case -3: on enh_error, yet there is NO such entry anywhere in the code. Additionally, you check for 0 for a good condition, yet there is no possibility of a changed enh_error value for 0 in call_dspam nor in the function that calls it. I am getting errors when I try to move things into the spam folder (default case error). DSPAM is installed, it is getting run and if I run it by hand with --user someone (without any domain junk) it runs fine. So, where does the 0 case get it's value where it doesn't show up in call_dspam?
Am I reading the code incorrectly?
Thanks, Trever
"History is nothing but a collection of fables and useless trifles, cluttered up with a mass of unnecessary figures and proper names." -- Leo Tolstoy
Hi,
I understand. I have a few questions about your code. You do a case -3: on enh_error, yet there is NO such entry anywhere in the code.
Heh. Cruft, I guess. The code has changed a lot over time.
Additionally, you check for 0 for a good condition, yet there is no possibility of a changed enh_error value for 0 in call_dspam nor in the function that calls it.
Cruft too then, or just defensive coding maybe...
I am getting errors when I try to move things into the spam folder (default case error). DSPAM is installed, it is getting run and if I run it by hand with --user someone (without any domain junk) it runs fine. So, where does the 0 case get it's value where it doesn't show up in call_dspam?
Have you tried printing out the dspam command line and doing exactly that command by hand?
johannes
On Tue, 2007-05-08 at 10:36 +0200, Johannes Berg wrote:
Hi,
I understand. I have a few questions about your code. You do a case -3: on enh_error, yet there is NO such entry anywhere in the code.
Heh. Cruft, I guess. The code has changed a lot over time.
Ok, good enough.
Additionally, you check for 0 for a good condition, yet there is no possibility of a changed enh_error value for 0 in call_dspam nor in the function that calls it.
Cruft too then, or just defensive coding maybe...
Actually, if you consider that cruft, then your code is broken as that is the condition necessary for it to work.
I am getting errors when I try to move things into the spam folder (default case error). DSPAM is installed, it is getting run and if I run it by hand with --user someone (without any domain junk) it runs fine. So, where does the 0 case get it's value where it doesn't show up in call_dspam?
Have you tried printing out the dspam command line and doing exactly that command by hand?
johannes
Yes, it works only if I have --user USER as mentioned, without any domain stuff. (joe, not joe@whatever). Anyway, I think I have fixed the program with two lines of code being changed (depending on the preferred coding style, this may be 1-3 lines...). I will send you the patch later today if I get some spam to test it out on. Basically it amounts to having the first two arguments after the program name to execl to be "--user" and the equivalent to getenv("USER"). (Mine is two lines because I have done char *user=getenv("USER") at the beginning of call_dspam since most people seem to prefer that style instead of having getenv directly in the call to execl.
The point is that you have to (a) when no files exist: move into spam: create a file <sig> with contents "spam" move out of spam: create a file <sig> with contents "notspam"
You said that you in your previous email. (B and C are indeed done.) A doesn't seem to happen. Your code complains about no signatures. I will dig into the code a little later and help out with this if you would like. For now, the code should be patched to allow things in the spam folder to be moved out even if their is no signature or the signature has expired (and no longer points to any cache in dspam). Maybe your code already does this. I haven't had a chance to test it yet.
Trever Adams
"If a revolution destroys a systematic government, but the systematic patterns of thought that produced that government are left intact, then those patterns will repeat themselves in the succeeding government." -- Robert M. Pirsig
Please, find the patch below. The first change is spam, sorry. The rest is what it takes to make it work on my system. I have no tested it and it works beautifully. If anyone is using domain stuff successfully, then please help make this patch work. I have patched my dspam.c (in dspam) in accordance with some suggestions I found so I don't have to deal with that as my system creates 3 different versions of each user, so I have just avoided the domain issues for now (since they are all versions of the same host, i.e. host name, localhost and user without anything... silly). Trever --- dspam.c-old 2007-05-08 03:36:49.000000000 -0600 +++ dspam.c 2007-05-08 03:33:52.000000000 -0600 @@ -72,7 +72,7 @@ #define MAXSIGLEN 100 #ifndef DSPAM -#define DSPAM "/usr/bin/dspam" +#define DSPAM "/usr/local/bin/dspam" #endif /* DSPAM */ static int @@ -83,6 +83,7 @@ char class_arg[16+2]; char sign_arg[MAXSIGLEN+2]; int pipes[2]; + char *user = getenv("USER"); s = snprintf(sign_arg, 101, "--signature=%s", signature); if ( s > MAXSIGLEN || s <= 0) return -1; @@ -152,9 +153,9 @@ close(fd); #ifdef DEBUG - syslog(LOG_INFO, DSPAM " --source=error --stdout %s %s", class_arg, sign_arg); + syslog(LOG_INFO, DSPAM " --user %s --source=error --stdout %s % s", user, class_arg, sign_arg); #endif - execl (DSPAM, DSPAM, "--source=error", "--stdout", class_arg, sign_arg, NULL); + execl (DSPAM, DSPAM, "--user", user, "--source=error", "--stdout", class_arg, sign_arg, NULL); exit(127); /* fall through if dspam can't be found */ return -1; /* never executed */ } -- "Be not defeated twice, once by circumstances and once by oneself." -- Lowell L. Bennion
On Tue, 2007-05-08 at 03:39 -0600, Trever L. Adams wrote:
@@ -152,9 +153,9 @@ close(fd);
+ execl (DSPAM, DSPAM, "--user", user, "--source=error", "--stdout", class_arg, sign_arg, NULL);
So with what configuration is that actually necessary? I know that my configuration works fine as-is. Should we make this optional? johannes
On Tue, 2007-05-08 at 11:48 +0200, Johannes Berg wrote:
On Tue, 2007-05-08 at 03:39 -0600, Trever L. Adams wrote:
@@ -152,9 +153,9 @@ close(fd);
+ execl (DSPAM, DSPAM, "--user", user, "--source=error", "--stdout", class_arg, sign_arg, NULL);
So with what configuration is that actually necessary? I know that my configuration works fine as-is. Should we make this optional?
johannes
I am attaching my dspam.conf, hopefully attachments aren't a pain. I will remember what you said about patches or, if acceptable, I will attach them in the future. If it isn't necessary in most configurations, then maybe an ifdef around the two versions of the two lines with a define right after the main comment body at the beginning of the source. Trever -- "Blessed is the man, who having nothing to say, abstains from giving wordy evidence of the fact." -- George Eliot
Hi,
Additionally, you check for 0 for a good condition, yet there is no possibility of a changed enh_error value for 0 in call_dspam nor in the function that calls it.
Cruft too then, or just defensive coding maybe...
Actually, if you consider that cruft, then your code is broken as that is the condition necessary for it to work.
Ok. Heh. I hadn't looked at the code when writing this. Let me check.
Ok so we're discussing enh_error. It's passed by pointer, so the fetch_and_copy_reclassified function first sets it to 0 assuming no error. If there's no signature, it is set to -2 and the loop is broken, this means that it'll be checked for non-zero later and we roll back the transaction. Alternatively, enh_error can be set != 0 if call_dspam returns an error which can happen when dspam returns an error code or isn't present.
Not sure I understand your question now.
Yes, it works only if I have --user USER as mentioned, without any domain stuff.
So I don't have --user given
(joe, not joe@whatever). Anyway, I think I have fixed the program with two lines of code being changed (depending on the preferred coding style, this may be 1-3 lines...). I will send you the patch later today if I get some spam to test it out on. Basically it amounts to having the first two arguments after the program name to execl to be "--user" and the equivalent to getenv("USER"). (Mine is two lines because I have done char *user=getenv("USER") at the beginning of call_dspam since most people seem to prefer that style instead of having getenv directly in the call to execl.
Usually dspam is able to either pick out the user from the signature (uid in signature setting for dspam) or from the user it's running under. I guess you're running some virtual user setup?
The point is that you have to (a) when no files exist: move into spam: create a file <sig> with contents "spam" move out of spam: create a file <sig> with contents "notspam"
You said that you in your previous email.
Heh, I was too lazy to dig it up.
(B and C are indeed done.) A doesn't seem to happen.
Well, no, B and C aren't really done either since we never touch any signature-database on disk.
Your code complains about no signatures. I will dig into the code a little later and help out with this if you would like. For now, the code should be patched to allow things in the spam folder to be moved out even if their is no signature or the signature has expired (and no longer points to any cache in dspam). Maybe your code already does this. I haven't had a chance to test it yet.
I think it *should* complain about signatures. If you try training such an old message that dspam no longer knows about it, it'll throw an error, and if you have a message without signature then you can't retrain it. I just delete the message in that case, though I suppose if you use a trash folder you'll have to use that configuration..
johannes
Hello,
On Tue, 2007-05-08 at 11:41 +0200, Johannes Berg wrote:
Hi,
transaction. Alternatively, enh_error can be set != 0 if call_dspam returns an error which can happen when dspam returns an error code or isn't present.
Yes, I found this. Which is where my patch came from.
Not sure I understand your question now.
Yes, it works only if I have --user USER as mentioned, without any domain stuff.
So I don't have --user given
No, you don't.
(joe, not joe@whatever). Anyway, I think I have fixed the program with two lines of code being changed (depending on the preferred coding style, this may be 1-3 lines...). I will send you the patch later today if I get some spam to test it out on. Basically it amounts to having the first two arguments after the program name to execl to be "--user" and the equivalent to getenv("USER"). (Mine is two lines because I have done char *user=getenv("USER") at the beginning of call_dspam since most people seem to prefer that style instead of having getenv directly in the call to execl.
Usually dspam is able to either pick out the user from the signature (uid in signature setting for dspam) or from the user it's running under. I guess you're running some virtual user setup?
Hmm, with my dspam patched or unpatched (to ignore the domain in the case of patch) this doesn't work for me. I am definitely NOT using virtual users (I have always had problems with doing that on qmail, sendmail and postfix, so I avoid it if possible).
I am running dspam 3.8.0 which is the first version I have actually installed and used.
The point is that you have to (a) when no files exist: move into spam: create a file <sig> with contents "spam" move out of spam: create a file <sig> with contents "notspam"
You said that you in your previous email.
Heh, I was too lazy to dig it up.
(B and C are indeed done.) A doesn't seem to happen.
Well, no, B and C aren't really done either since we never touch any signature-database on disk.
Ok, well, I am meaning that if a message is available and its signature is your code handles everything properly right now (not for a cron job but for immediate action).
Your code complains about no signatures. I will dig into the code a little later and help out with this if you would like. For now, the code should be patched to allow things in the spam folder to be moved out even if their is no signature or the signature has expired (and no longer points to any cache in dspam). Maybe your code already does this. I haven't had a chance to test it yet.
I think it *should* complain about signatures. If you try training such an old message that dspam no longer knows about it, it'll throw an error, and if you have a message without signature then you can't retrain it. I just delete the message in that case, though I suppose if you use a trash folder you'll have to use that configuration..
johannes
I believe we should fix it, if dspam would allow, to learn it as spam using --corpus or what not if that is still available and the appropriate option. However, this wasn't my complaint.
My complaint is that an email which is in SPAM should move out, even if it gives a warning (if that is possible) so that email can be saved. I am not talking about reclassifying things as spam or not spam. Does that make any sense?
Anyway, I hope the patch I sent a moment ago helps. I took the idea from the acl plugin. I am not sure if it works in a virtual environment or not. However, as I said, I haven't been able to get dspam to recognize the user from the signature. Would you mind sharing your dspam.conf file and compile options so I can see if there is something in my setup which is broken. (My compile options are a bit custom, but the dspam.conf is pretty much the suggested on in doc/ for postfix.)
Trever
"In Heaven an angel is nobody in particular." -- George Bernard Shaw (1856-1950)
Hi,
Usually dspam is able to either pick out the user from the signature (uid in signature setting for dspam) or from the user it's running under. I guess you're running some virtual user setup?
Hmm, with my dspam patched or unpatched (to ignore the domain in the case of patch) this doesn't work for me. I am definitely NOT using virtual users (I have always had problems with doing that on qmail, sendmail and postfix, so I avoid it if possible).
So your imap binaries are running as the real users? Oh. Do you use the dspam client/server setup?
I am running dspam 3.8.0 which is the first version I have actually installed and used.
I think I'm still on 3.6.something, latest debian/testing packages.
Ok, well, I am meaning that if a message is available and its signature is your code handles everything properly right now (not for a cron job but for immediate action).
Right. Ok, good to see that confirmed :)
I believe we should fix it, if dspam would allow, to learn it as spam using --corpus or what not if that is still available and the appropriate option. However, this wasn't my complaint.
Ah, it would allow that, but I'm not sure that's desirable. I'd have to look into the dspam docs more as to when you want to use that etc.
My complaint is that an email which is in SPAM should move out, even if it gives a warning (if that is possible) so that email can be saved. I am not talking about reclassifying things as spam or not spam. Does that make any sense?
Hmm. How did that mail end up in SPAM when it doesn't have a signature? I only move mail into SPAM that was classified by dspam as SPAM so hence also has a signature.
Anyway, I hope the patch I sent a moment ago helps. I took the idea from the acl plugin. I am not sure if it works in a virtual environment or not.
I don't know if it works there either :) I'll try the patch on my system and if it doesn't break anything I'll roll it in. Don't hold your breath though, it'll probably take me a week or two.
However, as I said, I haven't been able to get dspam to recognize the user from the signature. Would you mind sharing your dspam.conf file and compile options so I can see if there is something in my setup which is broken. (My compile options are a bit custom, but the dspam.conf is pretty much the suggested on in doc/ for postfix.)
I didn't compile dspam myself (debian) and my configuration is pretty straight forward, each user runs dspam and no client/server model is used.
johannes
On Tue, 2007-05-08 at 11:58 +0200, Johannes Berg wrote:
Hi,
Usually dspam is able to either pick out the user from the signature (uid in signature setting for dspam) or from the user it's running under. I guess you're running some virtual user setup?
Hmm, with my dspam patched or unpatched (to ignore the domain in the case of patch) this doesn't work for me. I am definitely NOT using virtual users (I have always had problems with doing that on qmail, sendmail and postfix, so I avoid it if possible).
So your imap binaries are running as the real users? Oh. Do you use the dspam client/server setup?
You mean dspam as --deamon? Yes, that was the recommendation in the documentation stating it was highly recommended not to do the other method. So, yes, my postfix file calls amavisd (for clamscan) which feeds it back into postfix which then calls dspam.
I believe we should fix it, if dspam would allow, to learn it as spam using --corpus or what not if that is still available and the appropriate option. However, this wasn't my complaint.
Ah, it would allow that, but I'm not sure that's desirable. I'd have to look into the dspam docs more as to when you want to use that etc.
If you decide it is a good idea, I wouldn't mind helping out.
Hmm. How did that mail end up in SPAM when it doesn't have a signature? I only move mail into SPAM that was classified by dspam as SPAM so hence also has a signature.
As I see it, dspam creates its .sig files. I do not believe these are kept around long term. Therefore, it is possible that the signature file disappears before one could move something out of SPAM (or into it, but I don't care about that). Am I misunderstanding something?
Anyway, I hope the patch I sent a moment ago helps. I took the idea from the acl plugin. I am not sure if it works in a virtual environment or not.
I don't know if it works there either :) I'll try the patch on my system and if it doesn't break anything I'll roll it in. Don't hold your breath though, it'll probably take me a week or two.
At least it works for non-virtual at the moment.
johannes
Sounds good. Have a great day.
Trever
"Be not defeated twice, once by circumstances and once by oneself." -- Lowell L. Bennion
On Tue, 2007-05-08 at 04:06 -0600, Trever L. Adams wrote:
You mean dspam as --deamon? Yes, that was the recommendation in the documentation stating it was highly recommended not to do the other method. So, yes, my postfix file calls amavisd (for clamscan) which feeds it back into postfix which then calls dspam.
I don't use --daemon right now, it was crashing too much and losing mail. YMMV.
Hmm. How did that mail end up in SPAM when it doesn't have a signature? I only move mail into SPAM that was classified by dspam as SPAM so hence also has a signature.
As I see it, dspam creates its .sig files. I do not believe these are kept around long term. Therefore, it is possible that the signature file disappears before one could move something out of SPAM (or into it, but I don't care about that). Am I misunderstanding something?
Well, you control how long the .sig files are kept around. I keep them a week longer than my spam folder contents.
johannes
On Wed, 2007-05-09 at 16:09 +0200, Johannes Berg wrote:
On Tue, 2007-05-08 at 04:06 -0600, Trever L. Adams wrote:
I don't use --daemon right now, it was crashing too much and losing mail. YMMV.
You have to limit the number of connections to 1. For some reason it locks up if there are more than 1.
Well, you control how long the .sig files are kept around. I keep them a week longer than my spam folder contents.
johannes
How are you cleaning up the folder contents?
Trever
"If destruction be our lot, we must ourselves be its author and finisher. As a nation of freemen, we must live through all time or die by suicide." -- Abraham Lincoln
On Wed, 2007-05-09 at 13:11 -0600, Trever L. Adams wrote:
I don't use --daemon right now, it was crashing too much and losing mail. YMMV.
You have to limit the number of connections to 1. For some reason it locks up if there are more than 1.
Cute.
Well, you control how long the .sig files are kept around. I keep them a week longer than my spam folder contents.
johannes
How are you cleaning up the folder contents?
Like this: http://johannes.sipsolutions.net/files/cleanspam
johannes
participants (2)
-
Johannes Berg
-
Trever L. Adams