LDA randomly failing to write email to disk

Andy Dills andy at xecu.net
Wed Sep 24 21:11:28 UTC 2014


We're using 2.2.13 with pigeonhole 0.4.3, in a clustered environment 
(maildir on netapp, dual dovecot instances where each server is both a 
proxy and a backend).

Every now and then (once a month per user, maybe?), users will see a blank 
email in their inbox. Investigating further, and we will see that the only 
information recorded in the maildir file for the message is the 
Return-Path, the Delivered-To, and the first Received line (the one 
generated by the local LDA via LMTP). 

For example, here is what I found in one such email today:

Return-Path: <xxx at xecu.net>
Delivered-To: <yyy at xecu.net>
Received: from mail5.xecu.net ([10.0.1.85])
                by mail2.xecu.net (Dovecot) with LMTP idX86eBjgaI1RdyQAA3SxDBg
                for <yyy at xecu.net>; Wed, 24 Sep 2014 15:24:20 -0400

Everything past that is lost, as if the either the LDA on mail2 exited 
early or was never sent the information from the dovecot instance on 
mail5.

Here is a little more detail from the header of the same email, different 
recipient (all other recipients received the email properly, only one 
failed to receive properly):

Return-Path: <xxx at xecu.net>
Delivered-To: <yyz at xecu.net>
Received: from mail5.xecu.net ([10.0.1.85])
    by mail1.xecu.net (Dovecot) with LMTP id KPh8ItMXI1StBAAA/c3zFg
    for <yyz at xecu.net>; Wed, 24 Sep 2014 15:24:20 -0400
Received: from mail5.xecu.net
    by mail5.xecu.net (Dovecot) with LMTP id QsUBFoQZI1RjhgAAXyr1JQ
    ; Wed, 24 Sep 2014 15:24:20 -0400
Received: from mail5.xecu.net (localhost [127.0.0.1])
    by mail5.xecu.net (Postfix) with ESMTP id 30AAE323BB18;
    Wed, 24 Sep 2014 15:24:20 -0400 (EDT)
...


This is what I see in the logs of mail2, which is where the failure 
happened:

Sep 24 15:24:20 mail2 dovecot: lmtp(51549, yyy at xecu.net): 
X86eBjgaI1RdyQAA3SxDBg: sieve: msgid=unspecified: stored mail into mailbox 'INBOX'

When I look at the logs of mail1, which is where the successful delivery 
happened:

Sep 24 15:24:20 mail1 dovecot: lmtp(1197, yyz at xecu.net): 
KPh8ItMXI1StBAAA/c3zFg: sieve: msgid=<20140924192412.1435.qmail at xxx.xecu.net>: stored mail into 
mailbox 'INBOX'

Nothing of note seems to be logged on mail5 (only a message from postfix, 
nothing from the proxy side of the lmtp).

I do notice, when I check for the PID of 51549 in the logs, all of its 
other transactions seem to register with proper msgids and were delivered 
fine.

Also, I notice plenty of other messages that have the msgid=unspecified 
error, but which were delivered with no problems and not truncated, so I'm 
suspecting what may be happening is that somehow the backend instance is 
not receiving the actual data portion, and only getting the envelope from 
proxy instance.

How do I approach debugging this? It's very infrequent, but yet quite 
annoying. Seems to have started since we upgraded to 2.2.13 (from an older 
2.1 build) earlier this year. 

Thanks,
Andy

---
Andy Dills
Xecunet, Inc.
www.xecu.net
301-682-9972
---




More information about the dovecot mailing list