LDA randomly failing to write email to disk
Andy Dills
andy at xecu.net
Wed Sep 24 21:11:28 UTC 2014
We're using 2.2.13 with pigeonhole 0.4.3, in a clustered environment
(maildir on netapp, dual dovecot instances where each server is both a
proxy and a backend).
Every now and then (once a month per user, maybe?), users will see a blank
email in their inbox. Investigating further, and we will see that the only
information recorded in the maildir file for the message is the
Return-Path, the Delivered-To, and the first Received line (the one
generated by the local LDA via LMTP).
For example, here is what I found in one such email today:
Return-Path: <xxx at xecu.net>
Delivered-To: <yyy at xecu.net>
Received: from mail5.xecu.net ([10.0.1.85])
by mail2.xecu.net (Dovecot) with LMTP idX86eBjgaI1RdyQAA3SxDBg
for <yyy at xecu.net>; Wed, 24 Sep 2014 15:24:20 -0400
Everything past that is lost, as if the either the LDA on mail2 exited
early or was never sent the information from the dovecot instance on
mail5.
Here is a little more detail from the header of the same email, different
recipient (all other recipients received the email properly, only one
failed to receive properly):
Return-Path: <xxx at xecu.net>
Delivered-To: <yyz at xecu.net>
Received: from mail5.xecu.net ([10.0.1.85])
by mail1.xecu.net (Dovecot) with LMTP id KPh8ItMXI1StBAAA/c3zFg
for <yyz at xecu.net>; Wed, 24 Sep 2014 15:24:20 -0400
Received: from mail5.xecu.net
by mail5.xecu.net (Dovecot) with LMTP id QsUBFoQZI1RjhgAAXyr1JQ
; Wed, 24 Sep 2014 15:24:20 -0400
Received: from mail5.xecu.net (localhost [127.0.0.1])
by mail5.xecu.net (Postfix) with ESMTP id 30AAE323BB18;
Wed, 24 Sep 2014 15:24:20 -0400 (EDT)
...
This is what I see in the logs of mail2, which is where the failure
happened:
Sep 24 15:24:20 mail2 dovecot: lmtp(51549, yyy at xecu.net):
X86eBjgaI1RdyQAA3SxDBg: sieve: msgid=unspecified: stored mail into mailbox 'INBOX'
When I look at the logs of mail1, which is where the successful delivery
happened:
Sep 24 15:24:20 mail1 dovecot: lmtp(1197, yyz at xecu.net):
KPh8ItMXI1StBAAA/c3zFg: sieve: msgid=<20140924192412.1435.qmail at xxx.xecu.net>: stored mail into
mailbox 'INBOX'
Nothing of note seems to be logged on mail5 (only a message from postfix,
nothing from the proxy side of the lmtp).
I do notice, when I check for the PID of 51549 in the logs, all of its
other transactions seem to register with proper msgids and were delivered
fine.
Also, I notice plenty of other messages that have the msgid=unspecified
error, but which were delivered with no problems and not truncated, so I'm
suspecting what may be happening is that somehow the backend instance is
not receiving the actual data portion, and only getting the envelope from
proxy instance.
How do I approach debugging this? It's very infrequent, but yet quite
annoying. Seems to have started since we upgraded to 2.2.13 (from an older
2.1 build) earlier this year.
Thanks,
Andy
---
Andy Dills
Xecunet, Inc.
www.xecu.net
301-682-9972
---
More information about the dovecot
mailing list