Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

Zenaan Harkness zenaan at freedbms.net
Thu Sep 12 09:57:35 EEST 2019


I am wondering why sieve-filter is so slow compared to gnu sieve.

I run mpop (like getmail) to download from a pop3 server to a local
mbox file: ~/mail/email-incoming-unsorted

This step is very fast.

The next step, I throw the email-incoming-unsorted mbox file at a
sieve processor, to sort the emails from that mbox, into other
mboxes, according to the sieve rules file.

Up until a couple days ago I was using Gnu sieve.

Gnu sieve balks on emails which have no x-message-id (?? something
like this) header field, so after a few years, I finally decided to
switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.

Using Gnu sieve, this mbox sorting step was even faster than mpop (/
getmail) - and mpop and getmail are really fast (compared with
fetchmail), since they pipeline the email downloads.

Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds
at most. Super fast.

Using sieve-filter, all emails are being processed - including those
without "message id header". This is good.

But also, using sieve filter, is really slower - slower than the
download step by an order of magnitude or two.

See below for details, any ideas appreciated.

To add to the below, I added:

mbox_very_dirty_syncs = yes

to the sieve-filter config, which slightly improves performance, but
not by much (in comparison with Gnu sieve).

TIA,



----- Forwarded message from Zenaan Harkness <zenaan at freedbms.net> -----

From: Zenaan Harkness <zenaan at freedbms.net>
To: debian-user at lists.debian.org
Date: Thu, 12 Sep 2019 08:06:12 +1000
Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:
> Why is Gnu sieve so extremely fast to batch process an mbox file, but
> while Dovecot's sieve-filter is an order of magnitude slower?
> 
> Sequence:
> 
>  - mpop or getmail to pipeline download emails into temp mbox file
>  - filter that file
> 
> Gnu sieve just flies through a local mbox file and saving emails to
> other local mbox files.
> 
> Gnu sieve rejects too many emails with "malformed" errors, so after a
> few years I bit the bullet and upgraded to Dovecot's sieve-filter.
> 
> Dovecot's sieve-filter, at present, is an order of magnitude slower.
> 
> Here's my filter command (one line):
> 
> /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions ~/etc/email/sieve.rc email-incoming-unsorted
> 
> The sieve script is fine now that I have the correct "require"
> clauses (hint: "capability strings").
> 
> File ~/etc/email/sieve-dovecot-config.conf:
> 
>   protocols = pop
>   lda_mailbox_autocreate = yes
>   lda_mailbox_autosubscribe = yes
>   mail_fsync = never
> 
> There's no re-sending of emails into my local Postfix SMTP server - I
> checked the system logs and confirmed this (journalctl -f).
> 
> I suspect that Gnu sieve was directly writing each email to the
> appropriate sieve-determined mbox file (perhaps with only a sync at
> the end of a single batch process - what I've attempted to achieve
> above with sieve-filter), and that sieve-filter is instead passing
> each email through some (dovecot) lda?
> 
> Here's the output for a sieve-filter batch processing of 11 emails:
> 
> $ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions /home/zen/etc/email/sieve.rc email-incoming-unsorted
> # PS0 Timestamp: 20190912 at 07:02:23
> info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: VentureBeat: The death of disk? H...'.
> info: msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=OYDeDX4FMcoRdGdQ at mail.gmail.com>: stored mail into mailbox 'l/cp/cp'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] xattr naming format in Zo...'.
> info: msgid=<15675101930.d5ba2E.12322 at composer.zfsonlinux.topicbox.com>: stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: [zfs-devel] xattr naming format i...'.
> info: msgid=<23955051567513749 at sas1-02732547ccc0.qloud-c.yandex.net>: stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: [Gluster-users] Issues with Geo-r...'.
> info: msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=iGdajTSzkfQ5PCZsUfyg at mail.gmail.com>: stored mail into mailbox 'l/gl/user'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'.
> info: msgid=<20190903133420.GS6166 at eeg.ccf.org>: stored mail into mailbox 'l/deb/user'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<0715adb7-540f-4cff-9282-e1252c53c2e8 at googlegroups.com>: stored mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa17d at googlegroups.com>: stored mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: [zfs-devel] xattr naming format i...'.
> info: msgid=<CAB5c7xpHCdFx1w3yA9FyRL-KQ8BUiCr4JbiDQRuFJj9nOgKxTg at mail.gmail.com>: stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: [asterisk-users] Playing MP3's in...'.
> info: msgid=<20190903151022.354xpe6ds2vglher at red.localdomain>: stored mail into mailbox 'l/as/users'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: [Hyperledger Fabric] a primitive ...'.
> info: msgid=<160901d8-b903-9e9a-91ac-267571b0e24d at gmx.com>: stored mail into mailbox 'l/hl/fabric'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffcc07 at googlegroups.com>: stored mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> 2 ▶︎️ zen at eye 20190912 at 07:02:30 ~ $
> 
> 
> So about 3/4 of a second is spent by dovecot's sieve-filter, on each
> email that it processes - watching it is painful given how fast Gnu
> sieve has been for the last few years - it's almost (but not quite)
> as slow as my previous fetchmail email download per-email time.
> 
> Attached is a -D debug run of sieve-filter on 20 emails - slightly
> longer than the above, and took roughly 15 seconds to run.
> 
> Any help appreciated...


On another test run of ~600 emails, sieve-filter is consistently
running ~100% of one CPU (for about 4 minutes) to process these
emails, which leads to the conclusion that despite what looks like
should be a batch process, sieve-filter is perhaps reloading the
rules for every single email that it processes, even though I gave it
a whole mbox, and not a single email, to process.

Can sieve-filter work the way it should / the way I want it / batch
process a whole mbox - without reloading the sieve rules for every
email?

----- End forwarded message -----


More information about the dovecot mailing list