[Dovecot] dovecot/lmtp munmap()-ing a lot
I observed several long running dovecot/lmtp processes hogging the CPU. I then strace'd them (strace -c -p 6375) and found them munmap()-ing a lot:
% time seconds usecs/call calls errors syscall
97.18 19.592537 1878 10430 munmap 2.28 0.458984 36 12696 epoll_ctl 0.26 0.052926 10 5288 fdatasync 0.21 0.042472 3 13679 epoll_wait ... snip ...
Why would that happen? (dovecot 2.1.17)
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
On 8.6.2014, at 11.59, Ralf Hildebrandt <r@sys4.de> wrote:
I observed several long running dovecot/lmtp processes hogging the CPU. I then strace'd them (strace -c -p 6375) and found them munmap()-ing a lot:
% time seconds usecs/call calls errors syscall
97.18 19.592537 1878 10430 munmap 2.28 0.458984 36 12696 epoll_ctl 0.26 0.052926 10 5288 fdatasync 0.21 0.042472 3 13679 epoll_wait ... snip ...
Why would that happen? (dovecot 2.1.17)
Difficult to say. It could be munmap()ing memory allocations or it could be munmap()ing Dovecot index files. Weren't there equivalent number of mmap() calls?..
BTW. In v2.2 the index file handling is faster for LDA/LMTP because it doesn't even try to mmap() the full indexes into memory.
- Timo Sirainen <tss@iki.fi>:
Why would that happen? (dovecot 2.1.17)
Difficult to say. It could be munmap()ing memory allocations or it could be munmap()ing Dovecot index files. Weren't there equivalent number of mmap() calls?..
BTW. In v2.2 the index file handling is faster for LDA/LMTP because it doesn't even try to mmap() the full indexes into memory.
That's probably the problem here. The user had LOTS of (duplicate!) mails in his inbox.
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
On 6/9/2014 5:44 PM, Ralf Hildebrandt <r@sys4.de> wrote:
That's probably the problem here. The user had LOTS of (duplicate!) mails in his inbox.
Anyone ever found a reliable way to do this?
It sure would be nice if dovecot could perform this on a per account and/or per maildir/mailbox case with a simple doveadm command...
Best regards,
Charles
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tue, 10 Jun 2014, Charles Marcus wrote:
On 6/9/2014 5:44 PM, Ralf Hildebrandt <r@sys4.de> wrote:
That's probably the problem here. The user had LOTS of (duplicate!) mails in his inbox.
Anyone ever found a reliable way to do this?
It sure would be nice if dovecot could perform this on a per account and/or per maildir/mailbox case with a simple doveadm command...
The basic question is: what is a duplicate?
I spot 100% duplicates within the same Maildir mailbox with a script similiar to "fdupes" http://linux.die.net/man/1/fdupes . Because an user may copy messages around, I scan one mailbox at a time.
For some rare cases, where I merge two accounts, I use a script, that looks for the message id in one account and removes all messages with the same id in the other account. Than I merge the Maildirs.
However, neither script I would call general enough for automatic processing.
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux)
iQEVAwUBU5cFZXz1H7kL/d9rAQK/ogf/YWmoJBc7tg5Wsnnz2FPHcxIrnC3YZD2b FXSFsCm60Gc2eyqW2zti7bNLOzZShcIUsYeRteV4lyC0iIcDD6QV13hc50O3LlPx L31kffgtmzIi1P0nQMkiIepbm75e0Rj+4XaaYSaEY1GKSMP6MDUhoBHLPwXS/qaK IahX1ALPVt9gE4SBf9iZirMfHssLJAQvUoGHf6PJLPzWPMUgUV1bY+2U07pgEBh3 4Iaq518zDIKxPp3hWj8d0AuTuErC3xh5Abdcs7x60cUFIoLIIfC7DOszDpA0OkUv Tbc7cOS2sPbP5B0n8k4J28H9h6tlLlUxIT144TUDy9bjbuKVP0FxhA== =CwgK -----END PGP SIGNATURE-----
Am 10.06.2014 15:17, schrieb Steffen Kaiser:
On Tue, 10 Jun 2014, Charles Marcus wrote:
On 6/9/2014 5:44 PM, Ralf Hildebrandt <r@sys4.de> wrote:
That's probably the problem here. The user had LOTS of (duplicate!) mails in his inbox.
Anyone ever found a reliable way to do this?
It sure would be nice if dovecot could perform this on a per account and/or per maildir/mailbox case with a simple doveadm command...
The basic question is: what is a duplicate?
I spot 100% duplicates within the same Maildir mailbox with a script similiar to "fdupes" http://linux.die.net/man/1/fdupes . Because an user may copy messages around, I scan one mailbox at a time.
For some rare cases, where I merge two accounts, I use a script, that looks for the message id in one account and removes all messages with the same id in the other account. Than I merge the Maildirs.
However, neither script I would call general enough for automatic processing
dbmail has just "suppress_duplicates = yes" and silently ignores *new received* messages with the same message-id to the same user as a global setting
that's fine for people not able to handling a mailing-list and hit reply-all every time
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tue, 10 Jun 2014, Reindl Harald wrote:
Am 10.06.2014 15:17, schrieb Steffen Kaiser:
On Tue, 10 Jun 2014, Charles Marcus wrote:
On 6/9/2014 5:44 PM, Ralf Hildebrandt <r@sys4.de> wrote:
That's probably the problem here. The user had LOTS of (duplicate!) mails in his inbox.
Anyone ever found a reliable way to do this?
It sure would be nice if dovecot could perform this on a per account and/or per maildir/mailbox case with a simple doveadm command...
The basic question is: what is a duplicate?
I spot 100% duplicates within the same Maildir mailbox with a script similiar to "fdupes" http://linux.die.net/man/1/fdupes . Because an user may copy messages around, I scan one mailbox at a time.
For some rare cases, where I merge two accounts, I use a script, that looks for the message id in one account and removes all messages with the same id in the other account. Than I merge the Maildirs.
However, neither script I would call general enough for automatic processing
dbmail has just "suppress_duplicates = yes" and silently ignores *new received* messages with the same message-id to the same user as a global setting
Wasn't there a thread some days/weeks ago, that Pigeonhole behaves the same by default and the poster asked how long the timeframe is Pigeonhole remembers the ids?
Actually, I still wonder about whether or not the same message-id is sufficient to decide to "silently drop" a message, as I interprete "to ignore a message" as "to drop". They might came different paths, some MUA might not generate ids unqiue world-wide or time-depended, ... . It's a matter of taste, IMHO.
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux)
iQEVAwUBU5cKf3z1H7kL/d9rAQIFXQf/eOVNj6OCbpbrvgvj1dUmQ4eqZuISO80A oMsncG65sYwOWZAepapdWQCxSK/+kEYmWm7nhmqC+ZfJebsEM+VRaL++gesNXlCZ Uo1VuQKgyEF0Y+buDvOSHxwn8+Fum3u6kiMkvf9Jiog+ucVwlOAsOvPrTfxdT9ST udBzpSjfE9JLWhptjKdqS/1Hum5I3UJN6nb0g2ZYTB1rVdQxmTfmnoRiMb5UeTRA aUpFBQULANbHFJiaVfnUXoYIU1cUl9iaywDSeNG34bmfXJlgGWfpMy1Ani5XdsR6 f7cnIGSdsNmthfdS3SHvY86TfYSf2qUMEJUi4k3QMjDlttWAATqvkA== =mlVS -----END PGP SIGNATURE-----
Am 10.06.2014 15:39, schrieb Steffen Kaiser:
On Tue, 10 Jun 2014, Reindl Harald wrote:
Am 10.06.2014 15:17, schrieb Steffen Kaiser:
The basic question is: what is a duplicate? However, neither script I would call general enough for automatic processing
dbmail has just "suppress_duplicates = yes" and silently ignores *new received* messages with the same message-id to the same user as a global setting
Wasn't there a thread some days/weeks ago, that Pigeonhole behaves the same by default and the poster asked how long the timeframe is Pigeonhole remembers the ids?
Actually, I still wonder about whether or not the same message-id is sufficient to decide to "silently drop" a message, as I interprete "to ignore a message" as "to drop". They might came different paths, some MUA might not generate ids unqiue world-wide or time-depended, ... . It's a matter of taste, IMHO
if the MUA generates no message-id at all the MTA usually does because otherwise you would risk to get messages rejected what we did many many years ago for any incoming mail without msgid
if it generates one it's unlikely to have the same message-id for the same RCPT - usually the current timestamp is part of it
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tue, 10 Jun 2014, Reindl Harald wrote:
Am 10.06.2014 15:39, schrieb Steffen Kaiser:
On Tue, 10 Jun 2014, Reindl Harald wrote:
Am 10.06.2014 15:17, schrieb Steffen Kaiser:
The basic question is: what is a duplicate? However, neither script I would call general enough for automatic processing
dbmail has just "suppress_duplicates = yes" and silently ignores *new received* messages with the same message-id to the same user as a global setting
Wasn't there a thread some days/weeks ago, that Pigeonhole behaves the same by default and the poster asked how long the timeframe is Pigeonhole remembers the ids?
Actually, I still wonder about whether or not the same message-id is sufficient to decide to "silently drop" a message, as I interprete "to ignore a message" as "to drop". They might came different paths, some MUA might not generate ids unqiue world-wide or time-depended, ... . It's a matter of taste, IMHO
if it generates one it's unlikely to have the same message-id for the same RCPT
yes, but then some recipients forward (automatically or manually). Or you a fetchmail-like grabber that re-transmits the message, ... .
- usually the current timestamp is part of it
that I mean with "time-depended", but you also used "unlikely" and "usually". So you still see a little chance, that the message-id is not world-wide unique. ;-)
I know, nowadays all MUAs should be capable of generating sensible message ids and some claims about bandwith and such are outdated, too. You have to rely on information you do not control -> you have to decide how far to trust.
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux)
iQEVAwUBU5cWsnz1H7kL/d9rAQI+Zgf+Pp7968AvnVuOdd/RcnK2fd1rxetBtnzY DKkKjZ3jV9vwKr5yvxNQ5Ic9liHNrW7QvnFOFlPSPZTp5MgmM3dN6LpKTxmWgK4o zb4Zizp2FwWR/qRj67t+tdlyLC/ZVarSRcR4KW5y1iGr7MlvR0RDxaT5Rw1y33bG VZIlnR+LOwQaCa9sg9HjbpsG6FTkgB9VQjgMdqQYdba1+C2RPD/9fm5+CM58JXFt UUBGZITL/AEKBOJ5U1OyK1gr9BlJvbHeIuEJ4XVF7ybkV4rDSngt4Z/8SXaKf2AC FxJT3XbnsJv22iuNA+2LpZxTRQa5QYYoyZSNd70wGnb3GBXjGP4lMA== =WTSS -----END PGP SIGNATURE-----
Wasn't there a thread some days/weeks ago, that Pigeonhole behaves the same by default and the poster asked how long the timeframe is Pigeonhole remembers the ids?
How would I go about enabling this?
Actually, I still wonder about whether or not the same message-id is sufficient to decide to "silently drop" a message, as I interprete "to ignore a message" as "to drop". They might came different paths, some MUA might not generate ids unqiue world-wide or time-depended, ... . It's a matter of taste, IMHO.
You're probably right, but in the case of a runaway fetchmail it would nbe sufficient.
mutt's "~=" Tagging does the same (IMHO)
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
The basic question is: what is a duplicate?
I spot 100% duplicates within the same Maildir mailbox with a script similiar to "fdupes" http://linux.die.net/man/1/fdupes . Because an user may copy messages around, I scan one mailbox at a time.
But with mdbox? Or mailboxes != Maildir format in general?
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
- Charles Marcus <dovecot@dovecot.org>:
On 6/9/2014 5:44 PM, Ralf Hildebrandt <r@sys4.de> wrote:
That's probably the problem here. The user had LOTS of (duplicate!) mails in his inbox.
Anyone ever found a reliable way to do this?
To duplicate the mails? Yeah: Just let fetchmail run unobserved for weeks, will fuck up things nicely. No manual intervention needed.
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
On 6/10/2014 12:32 PM, Ralf Hildebrandt <r@sys4.de> wrote:
- Charles Marcus<dovecot@dovecot.org>:
On 6/9/2014 5:44 PM, Ralf Hildebrandt<r@sys4.de> wrote:
That's probably the problem here. The user had LOTS of (duplicate!) mails in his inbox.
Anyone ever found a reliable way to do this? To duplicate the mails?
'This' referred obviously to my altered SUBJECT... ;)
Best regards,
Charles
On 10.6.2014, at 14.05, Charles Marcus <CMarcus@Media-Brokers.com> wrote:
On 6/9/2014 5:44 PM, Ralf Hildebrandt <r@sys4.de> wrote:
That's probably the problem here. The user had LOTS of (duplicate!) mails in his inbox.
Anyone ever found a reliable way to do this?
It sure would be nice if dovecot could perform this on a per account and/or per maildir/mailbox case with a simple doveadm command...
doveadm deduplicate
On 3.7.2014 21:35, Timo Sirainen wrote:
On 10.6.2014, at 14.05, Charles Marcus <CMarcus@Media-Brokers.com> wrote:
On 6/9/2014 5:44 PM, Ralf Hildebrandt <r@sys4.de> wrote:
That's probably the problem here. The user had LOTS of (duplicate!) mails in his inbox.
Anyone ever found a reliable way to do this?
It sure would be nice if dovecot could perform this on a per account and/or per maildir/mailbox case with a simple doveadm command...
doveadm deduplicate
When I last tried, doveadm deduplicate was quite unreliable (IMO to the point of not being worth using at all.)
http://www.dovecot.org/list/dovecot/2014-March/095447.html
(Tried again, the behaviour is the same in 2.2.13 from Debian testing)
On 6/9/2014 5:44 PM, Ralf Hildebrandt <r@sys4.de> wrote:
That's probably the problem here. The user had LOTS of (duplicate!) mails in his inbox.
I had the same problem with a corrupted IBM Domino mailfile which gave me lots of duplicates, however imapsync with the options "--useheader Date --useheader Subject --skipheader X.*" to eliminate headers according to date+subject did the trick.
Alternatively you could have a look at the Thunderbird Addon "Remove Duplicate Message (alternate) 0.39" - havent used it on large IMAP folders but works also nicely.
hth, infoomatic
participants (7)
-
Charles Marcus
-
Infoomatic
-
Jiri Bourek
-
Ralf Hildebrandt
-
Reindl Harald
-
Steffen Kaiser
-
Timo Sirainen