Processing incoming mail efficiently

Marc Roos Marc at f1-outsourcing.eu
Sat Jan 30 22:33:13 EET 2021



> -----Original Message-----
> From: dovecot <dovecot-bounces at dovecot.org> On Behalf Of Ron Garret
> Sent: 30 January 2021 17:49
> To: Dovecot <dovecot at dovecot.org>
> Subject: Processing incoming mail efficiently
> 
> I’ve asked a related question on this list before but I now have a much
> better handle on what I’m doing and I realize that I still don’t know
> the answer, so I’m going to ask this again in a slightly different form.
> 
> I’m writing a spam filter, so obviously I need to feed incoming mail to
> it somehow.  The “obvious” way to do this is with a sieve script using
> the pipe extension.  There are two problems with this:

No, that is not obvious, this would imply a dependency on sieve.

> 1.  This will always pipe the entire file no matter how big it is.  The
> filter will often not need to process the body of the message, 

Yes because your starting point is wrong. Using mailfromd you can process a specific milter state, see envfrom envrcpt etc.

https://puszcza.gnu.org.ua/software/mailfromd/manual/mailfromd.html#handler-names

only the
> headers, or only the first part of a multipart MIME message.  Is there
> any way to allow my filter to open the file in which the message is
> stored rather than piping it a copy of the message?
> 
> 2.  Once the filter has processed the message and decided if it’s spam
> it still needs to move the message to the appropriate folder (INBOX or
> Junk).  To do this it needs to somehow correlate the *content* of the
> message that was piped to it with the UID of the message that needs to
> be moved.  One way to do this is to pull out the message-id header and
> then use doveadm

No, in what ever milter state you are processing. You can add a message header 'This is spam'. And you make just one sieve rule that moves messages on the existance of that specific header.

> to find the file containing the message with that
> message-id, but there are two problems with this.  First, not all
> messages have message-ids.  I can work around this by adding my own

First you have crawl, before walking. So learn how to crawl. It does not make sense trying to make something, if you do not know specifics.

> message-id to messages that don’t already have them, but this just feel
> wrong.  And second, unless dovecot keeps an index of message-ids (does
> it?) then this will be horribly inefficient because it will have to
> essentially grep for the message id every time I want to move a message.
> So it seems like there has to be a better way, but I can’t think of what
> that would be.

Start playing with mailfromd. It has scripting language to configure it and all tools(funtions) are available to do whatever you can think of.

https://puszcza.gnu.org.ua/software/mailfromd/manual/mailfromd.html#Filter-Script-Example

> I figure this has to be a solved problem because I am obviously not the
> first person to write a spam filter for dovecot.  What is the Right Way
> to do this?
> 


As written above 



More information about the dovecot mailing list