LDA randomly failing to write email to disk
We're using 2.2.13 with pigeonhole 0.4.3, in a clustered environment (maildir on netapp, dual dovecot instances where each server is both a proxy and a backend).
Every now and then (once a month per user, maybe?), users will see a blank email in their inbox. Investigating further, and we will see that the only information recorded in the maildir file for the message is the Return-Path, the Delivered-To, and the first Received line (the one generated by the local LDA via LMTP).
For example, here is what I found in one such email today:
Return-Path: xxx@xecu.net Delivered-To: yyy@xecu.net Received: from mail5.xecu.net ([10.0.1.85]) by mail2.xecu.net (Dovecot) with LMTP idX86eBjgaI1RdyQAA3SxDBg for yyy@xecu.net; Wed, 24 Sep 2014 15:24:20 -0400
Everything past that is lost, as if the either the LDA on mail2 exited early or was never sent the information from the dovecot instance on mail5.
Here is a little more detail from the header of the same email, different recipient (all other recipients received the email properly, only one failed to receive properly):
Return-Path: xxx@xecu.net Delivered-To: yyz@xecu.net Received: from mail5.xecu.net ([10.0.1.85]) by mail1.xecu.net (Dovecot) with LMTP id KPh8ItMXI1StBAAA/c3zFg for yyz@xecu.net; Wed, 24 Sep 2014 15:24:20 -0400 Received: from mail5.xecu.net by mail5.xecu.net (Dovecot) with LMTP id QsUBFoQZI1RjhgAAXyr1JQ ; Wed, 24 Sep 2014 15:24:20 -0400 Received: from mail5.xecu.net (localhost [127.0.0.1]) by mail5.xecu.net (Postfix) with ESMTP id 30AAE323BB18; Wed, 24 Sep 2014 15:24:20 -0400 (EDT) ...
This is what I see in the logs of mail2, which is where the failure happened:
Sep 24 15:24:20 mail2 dovecot: lmtp(51549, yyy@xecu.net): X86eBjgaI1RdyQAA3SxDBg: sieve: msgid=unspecified: stored mail into mailbox 'INBOX'
When I look at the logs of mail1, which is where the successful delivery happened:
Sep 24 15:24:20 mail1 dovecot: lmtp(1197, yyz@xecu.net): KPh8ItMXI1StBAAA/c3zFg: sieve: msgid=20140924192412.1435.qmail@xxx.xecu.net: stored mail into mailbox 'INBOX'
Nothing of note seems to be logged on mail5 (only a message from postfix, nothing from the proxy side of the lmtp).
I do notice, when I check for the PID of 51549 in the logs, all of its other transactions seem to register with proper msgids and were delivered fine.
Also, I notice plenty of other messages that have the msgid=unspecified error, but which were delivered with no problems and not truncated, so I'm suspecting what may be happening is that somehow the backend instance is not receiving the actual data portion, and only getting the envelope from proxy instance.
How do I approach debugging this? It's very infrequent, but yet quite annoying. Seems to have started since we upgraded to 2.2.13 (from an older 2.1 build) earlier this year.
Thanks, Andy
Andy Dills Xecunet, Inc. www.xecu.net 301-682-9972
On 25 Sep 2014, at 00:11, Andy Dills andy@xecu.net wrote:
We're using 2.2.13 with pigeonhole 0.4.3, in a clustered environment (maildir on netapp, dual dovecot instances where each server is both a proxy and a backend).
Every now and then (once a month per user, maybe?), users will see a blank email in their inbox. Investigating further, and we will see that the only information recorded in the maildir file for the message is the Return-Path, the Delivered-To, and the first Received line (the one generated by the local LDA via LMTP).
This is fixed in hg. I guess I'll just have to make v2.2.14 release soon.
which HG Rev fixes this?
On Fri, Oct 3, 2014 at 9:14 AM, Timo Sirainen tss@iki.fi wrote:
On 25 Sep 2014, at 00:11, Andy Dills andy@xecu.net wrote:
We're using 2.2.13 with pigeonhole 0.4.3, in a clustered environment (maildir on netapp, dual dovecot instances where each server is both a proxy and a backend).
Every now and then (once a month per user, maybe?), users will see a blank email in their inbox. Investigating further, and we will see that the only information recorded in the maildir file for the message is the Return-Path, the Delivered-To, and the first Received line (the one generated by the local LDA via LMTP).
This is fixed in hg. I guess I'll just have to make v2.2.14 release soon.
-- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 (c) E-Mail: larryrtx@gmail.com US Mail: 108 Turvey Cove, Hutto, TX 78634-5688
On Fri, 3 Oct 2014, Timo Sirainen wrote:
This is fixed in hg. I guess I'll just have to make v2.2.14 release soon.
Thanks Timo, I had given up hope, and was starting to question if maybe I was having hardware issues.
Should I feel hesitant about rolling out a fresh build from hg into production? Would I be better off waiting for an official 2.2.14?
Thanks, Andy
Andy Dills Xecunet, Inc. www.xecu.net 301-682-9972
participants (3)
-
Andy Dills
-
Larry Rosenman
-
Timo Sirainen