Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)
I am wondering why sieve-filter is so slow compared to gnu sieve.
I run mpop (like getmail) to download from a pop3 server to a local mbox file: ~/mail/email-incoming-unsorted
This step is very fast.
The next step, I throw the email-incoming-unsorted mbox file at a sieve processor, to sort the emails from that mbox, into other mboxes, according to the sieve rules file.
Up until a couple days ago I was using Gnu sieve.
Gnu sieve balks on emails which have no x-message-id (?? something like this) header field, so after a few years, I finally decided to switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.
Using Gnu sieve, this mbox sorting step was even faster than mpop (/ getmail) - and mpop and getmail are really fast (compared with fetchmail), since they pipeline the email downloads.
Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds at most. Super fast.
Using sieve-filter, all emails are being processed - including those without "message id header". This is good.
But also, using sieve filter, is really slower - slower than the download step by an order of magnitude or two.
See below for details, any ideas appreciated.
To add to the below, I added:
mbox_very_dirty_syncs = yes
to the sieve-filter config, which slightly improves performance, but not by much (in comparison with Gnu sieve).
TIA,
----- Forwarded message from Zenaan Harkness <zenaan@freedbms.net> -----
From: Zenaan Harkness <zenaan@freedbms.net> To: debian-user@lists.debian.org Date: Thu, 12 Sep 2019 08:06:12 +1000 Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)
On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:
Why is Gnu sieve so extremely fast to batch process an mbox file, but while Dovecot's sieve-filter is an order of magnitude slower?
Sequence:
- mpop or getmail to pipeline download emails into temp mbox file
- filter that file
Gnu sieve just flies through a local mbox file and saving emails to other local mbox files.
Gnu sieve rejects too many emails with "malformed" errors, so after a few years I bit the bullet and upgraded to Dovecot's sieve-filter.
Dovecot's sieve-filter, at present, is an order of magnitude slower.
Here's my filter command (one line):
/usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions ~/etc/email/sieve.rc email-incoming-unsorted
The sieve script is fine now that I have the correct "require" clauses (hint: "capability strings").
File ~/etc/email/sieve-dovecot-config.conf:
protocols = pop lda_mailbox_autocreate = yes lda_mailbox_autosubscribe = yes mail_fsync = never
There's no re-sending of emails into my local Postfix SMTP server - I checked the system logs and confirmed this (journalctl -f).
I suspect that Gnu sieve was directly writing each email to the appropriate sieve-determined mbox file (perhaps with only a sync at the end of a single batch process - what I've attempted to achieve above with sieve-filter), and that sieve-filter is instead passing each email through some (dovecot) lda?
Here's the output for a sieve-filter batch processing of 11 emails:
$ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions /home/zen/etc/email/sieve.rc email-incoming-unsorted # PS0 Timestamp: 20190912@07:02:23 info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes]
Re: VentureBeat: The death of disk? H...'. info: msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=OYDeDX4FMcoRdGdQ@mail.gmail.com>: stored mail into mailbox 'l/cp/cp'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes]
[zfs-devel] xattr naming format in Zo...'. info: msgid=<15675101930.d5ba2E.12322@composer.zfsonlinux.topicbox.com>: stored mail into mailbox 'l/z/zdev'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes]Re: [zfs-devel] xattr naming format i...'. info: msgid=<23955051567513749@sas1-02732547ccc0.qloud-c.yandex.net>: stored mail into mailbox 'l/z/zdev'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes]
Re: [Gluster-users] Issues with Geo-r...'. info: msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=iGdajTSzkfQ5PCZsUfyg@mail.gmail.com>: stored mail into mailbox 'l/gl/user'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes]Re: tasksel'. info: msgid=<20190903133420.GS6166@eeg.ccf.org>: stored mail into mailbox 'l/deb/user'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes]
[awx-project] Re: AWX on Kubernetes m...'. info: msgid=<0715adb7-540f-4cff-9282-e1252c53c2e8@googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes][awx-project] Re: AWX on Kubernetes m...'. info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa17d@googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes]
Re: [zfs-devel] xattr naming format i...'. info: msgid=<CAB5c7xpHCdFx1w3yA9FyRL-KQ8BUiCr4JbiDQRuFJj9nOgKxTg@mail.gmail.com>: stored mail into mailbox 'l/z/zdev'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes]Re: [asterisk-users] Playing MP3's in...'. info: msgid=<20190903151022.354xpe6ds2vglher@red.localdomain>: stored mail into mailbox 'l/as/users'. info: message expunged from source mailbox upon successful move. info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes]
Re: [Hyperledger Fabric] a primitive ...'. info: msgid=<160901d8-b903-9e9a-91ac-267571b0e24d@gmx.com>: stored mail into mailbox 'l/hl/fabric'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] `[awx-project] Re: AWX on Kubernetes m...'. info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffcc07@googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. info: message expunged from source mailbox upon successful move. 2 ▶︎️ zen@eye 20190912@07:02:30 ~ $So about 3/4 of a second is spent by dovecot's sieve-filter, on each email that it processes - watching it is painful given how fast Gnu sieve has been for the last few years - it's almost (but not quite) as slow as my previous fetchmail email download per-email time.
Attached is a -D debug run of sieve-filter on 20 emails - slightly longer than the above, and took roughly 15 seconds to run.
Any help appreciated...
On another test run of ~600 emails, sieve-filter is consistently running ~100% of one CPU (for about 4 minutes) to process these emails, which leads to the conclusion that despite what looks like should be a batch process, sieve-filter is perhaps reloading the rules for every single email that it processes, even though I gave it a whole mbox, and not a single email, to process.
Can sieve-filter work the way it should / the way I want it / batch process a whole mbox - without reloading the sieve rules for every email?
----- End forwarded message -----
Don't use mbox.
It is very slow format when mails need to be deleted from the middle. Basically rewriting the whole mbox file each time.
Use sdbox instead.
Sami
On 12 Sep 2019, at 9.57, Zenaan Harkness via dovecot <dovecot@dovecot.org> wrote:
I am wondering why sieve-filter is so slow compared to gnu sieve.
I run mpop (like getmail) to download from a pop3 server to a local mbox file: ~/mail/email-incoming-unsorted
This step is very fast.
The next step, I throw the email-incoming-unsorted mbox file at a sieve processor, to sort the emails from that mbox, into other mboxes, according to the sieve rules file.
Up until a couple days ago I was using Gnu sieve.
Gnu sieve balks on emails which have no x-message-id (?? something like this) header field, so after a few years, I finally decided to switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.
Using Gnu sieve, this mbox sorting step was even faster than mpop (/ getmail) - and mpop and getmail are really fast (compared with fetchmail), since they pipeline the email downloads.
Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds at most. Super fast.
Using sieve-filter, all emails are being processed - including those without "message id header". This is good.
But also, using sieve filter, is really slower - slower than the download step by an order of magnitude or two.
See below for details, any ideas appreciated.
To add to the below, I added:
mbox_very_dirty_syncs = yes
to the sieve-filter config, which slightly improves performance, but not by much (in comparison with Gnu sieve).
TIA,
----- Forwarded message from Zenaan Harkness <zenaan@freedbms.net> -----
From: Zenaan Harkness <zenaan@freedbms.net> To: debian-user@lists.debian.org Date: Thu, 12 Sep 2019 08:06:12 +1000 Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)
On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:
Why is Gnu sieve so extremely fast to batch process an mbox file, but while Dovecot's sieve-filter is an order of magnitude slower?
Sequence:
- mpop or getmail to pipeline download emails into temp mbox file
- filter that file
Gnu sieve just flies through a local mbox file and saving emails to other local mbox files.
Gnu sieve rejects too many emails with "malformed" errors, so after a few years I bit the bullet and upgraded to Dovecot's sieve-filter.
Dovecot's sieve-filter, at present, is an order of magnitude slower.
Here's my filter command (one line):
/usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions ~/etc/email/sieve.rc email-incoming-unsorted
The sieve script is fine now that I have the correct "require" clauses (hint: "capability strings").
File ~/etc/email/sieve-dovecot-config.conf:
protocols = pop lda_mailbox_autocreate = yes lda_mailbox_autosubscribe = yes mail_fsync = never
There's no re-sending of emails into my local Postfix SMTP server - I checked the system logs and confirmed this (journalctl -f).
I suspect that Gnu sieve was directly writing each email to the appropriate sieve-determined mbox file (perhaps with only a sync at the end of a single batch process - what I've attempted to achieve above with sieve-filter), and that sieve-filter is instead passing each email through some (dovecot) lda?
Here's the output for a sieve-filter batch processing of 11 emails:
$ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions /home/zen/etc/email/sieve.rc email-incoming-unsorted # PS0 Timestamp: 20190912@07:02:23 info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes]
Re: VentureBeat: The death of disk? H...'. info: msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=OYDeDX4FMcoRdGdQ@mail.gmail.com>: stored mail into mailbox 'l/cp/cp'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes]
[zfs-devel] xattr naming format in Zo...'. info: msgid=<15675101930.d5ba2E.12322@composer.zfsonlinux.topicbox.com>: stored mail into mailbox 'l/z/zdev'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes]Re: [zfs-devel] xattr naming format i...'. info: msgid=<23955051567513749@sas1-02732547ccc0.qloud-c.yandex.net>: stored mail into mailbox 'l/z/zdev'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes]
Re: [Gluster-users] Issues with Geo-r...'. info: msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=iGdajTSzkfQ5PCZsUfyg@mail.gmail.com>: stored mail into mailbox 'l/gl/user'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes]Re: tasksel'. info: msgid=<20190903133420.GS6166@eeg.ccf.org>: stored mail into mailbox 'l/deb/user'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes]
[awx-project] Re: AWX on Kubernetes m...'. info: msgid=<0715adb7-540f-4cff-9282-e1252c53c2e8@googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes][awx-project] Re: AWX on Kubernetes m...'. info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa17d@googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes]
Re: [zfs-devel] xattr naming format i...'. info: msgid=<CAB5c7xpHCdFx1w3yA9FyRL-KQ8BUiCr4JbiDQRuFJj9nOgKxTg@mail.gmail.com>: stored mail into mailbox 'l/z/zdev'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes]Re: [asterisk-users] Playing MP3's in...'. info: msgid=<20190903151022.354xpe6ds2vglher@red.localdomain>: stored mail into mailbox 'l/as/users'. info: message expunged from source mailbox upon successful move. info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes]
Re: [Hyperledger Fabric] a primitive ...'. info: msgid=<160901d8-b903-9e9a-91ac-267571b0e24d@gmx.com>: stored mail into mailbox 'l/hl/fabric'. info: message expunged from source mailbox upon successful move. info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] `[awx-project] Re: AWX on Kubernetes m...'. info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffcc07@googlegroups.com>: stored mail into mailbox 'l/ansible/awx'. info: message expunged from source mailbox upon successful move. 2 ▶︎️ zen@eye 20190912@07:02:30 ~ $So about 3/4 of a second is spent by dovecot's sieve-filter, on each email that it processes - watching it is painful given how fast Gnu sieve has been for the last few years - it's almost (but not quite) as slow as my previous fetchmail email download per-email time.
Attached is a -D debug run of sieve-filter on 20 emails - slightly longer than the above, and took roughly 15 seconds to run.
Any help appreciated...
On another test run of ~600 emails, sieve-filter is consistently running ~100% of one CPU (for about 4 minutes) to process these emails, which leads to the conclusion that despite what looks like should be a batch process, sieve-filter is perhaps reloading the rules for every single email that it processes, even though I gave it a whole mbox, and not a single email, to process.
Can sieve-filter work the way it should / the way I want it / batch process a whole mbox - without reloading the sieve rules for every email?
----- End forwarded message -----
On Sep 12, 2019, at 12:57 AM, Zenaan Harkness <zenaan@freedbms.net> wrote:
The next step, I throw the email-incoming-unsorted mbox file at a sieve processor, to sort the emails from that mbox, into other mboxes, according to the sieve rules file.
I would expect mbox is the worst possible format choice for this.
Gnu sieve balks on emails which have no x-message-id (?? something like this) header field, so after a few years, I finally decided to switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.
Using Gnu sieve, this mbox sorting step was even faster than mpop (/ getmail) - and mpop and getmail are really fast (compared with fetchmail), since they pipeline the email downloads.
Perhaps because of its reliance on the header allowing it to index?
Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds at most. Super fast.
That doesn’t sound fast. I processed a few thousand messages through sieve in less than 10 seconds, if I recall correctly.
See below for details, any ideas appreciated.
The first thing I would do is download to Maildir and see what the difference is.
-- What we have here is a failure to communicate.
participants (3)
-
@lbutlr
-
Sami Ketola
-
Zenaan Harkness