Timothy,
a) Integrate it into libdspam rather than calling the dspam binary?
No, not right now. libdspam used to segfault a lot when I wrote this so I didn't want to have it do that. It also never reported proper errors, and generally sucks. libdspam itself parses the config file as far as I can tell and various other strange things like that.
b) Any ideas for scaling it better, and implementing it scaled better? I notice a slight discussion about scaling, and you have chosen not to pursue it. Is that because it's too hard, or just because you don't need to?
Because I don't need to. But if you really need it scaled better, then I recommend not bothering with libdspam, but instead do it this way: (a) when a user moves mail, create an entry with the signature in some sql database with the signature and a 'count'. You can create an SQL statement that increases/decreases count or inserts a record with count==X if the record is not present. I'd implement it to be count==1 for spam, count==-1 for non-spam, and when a user moves mail you increase/decrease count (b) nightly, you go through and
- remove all entries with count==0, that means that the user decided not to train after all
- calls dspam with all the signatures that have count != 0, depending on the count value they are either spam or not spam.
This gets rid of the dspam dependency in the plugin completely, and you don't need libdspam any more.
johannes