[Dovecot] Mailbox locking issue?
Here's my configuration.
dovecot -n # 1.0.7: /usr/local/etc/dovecot.conf protocols: pop3 ssl_disable: yes disable_plaintext_auth: no login_dir: /usr/local/var/run/dovecot/login login_executable: /usr/local/libexec/dovecot/pop3-login mail_extra_groups: mail mail_location: /var/spool/mail/%u mbox_write_locks: fcntl dotlock
The above setting was just added in an attempt to resolve the issue. It was default before.
mail_executable: /usr/local/libexec/dovecot/pop3 mail_plugin_dir: /usr/local/lib/dovecot/pop3 pop3_uidl_format: %08Xu%08Xv auth default: mechanisms: plain apop login worker_max_count: 5 passdb: driver: passwd-file args: /usr/local/etc/dovecot.passwd passdb: driver: pam userdb: driver: passwd socket: type: listen client: path: /var/spool/postfix/private/auth mode: 432 user: postfix group: postfix
Postfix 2.3.3 mailbox_delivery_lock = fcntl, dotlock
The problem is the following. A user uses fetchmail via an ssh tunnel to download mail via pop3 every 5 minutes. This normally works fine. perhaps once a week or so, maillog shows that Postfix delivers a message while fetchmail is retrieving. Here is one such.
Apr 8 19:07:20 sbh16 dovecot: pop3-login: Login: user=<xxx>, method=PLAIN, rip=72.52.113.36, lip=72.52.113.36, secured Apr 8 19:07:22 sbh16 postfix/local[13745]: D959169046B: to=xxx@example.com>, orig_to=xxx@example.org, relay=local, delay=16, delays=15/0/0/1.1, dsn=2.0.0, status=sent (delivered to mailbox) Apr 8 19:07:22 sbh16 dovecot: POP3(xxx): Disconnected: Logged out top=2/15995, retr=0/0, del=2/2, size=15985
Five minutes later when fetchmail comes back, we see
Apr 8 19:12:42 sbh16 dovecot: POP3(xxx): mbox sync: Expunged message reappeared in mailbox /var/spool/mail/xxx (UID 4683 < 4684, seq=2, idx_msgs=0) Apr 8 19:12:42 sbh16 dovecot: pop3-login: Login: user=<xxx>, method=PLAIN, rip=72.52.113.36, lip=72.52.113.36, secured Apr 8 19:12:46 sbh16 dovecot: POP3(xxx): mbox sync: UID inserted in the middle of mailbox /var/spool/mail/xxx (4684 > 4683, seq=2, idx_msgs=1) Apr 8 19:12:46 sbh16 dovecot: POP3(xxx): Disconnected: Logged out top=1/46375, retr=0/0, del=1/1, size=46365
From now on, as messages are added to the mailbox by Postfix, fetchmail keeps retrieving both old and new messages. i.e. even though dovecot's log says del 1/1 (or del n/n) no messages are actually deleted from the mailbox.
This situation continues until someone accesses the mailbox locally (e.g. via mutt through a local shell) and deletes the old messages.
As noted above, this has been observed a few times with dovecot's default mbox_write_locks. I have since changed that to
mbox_write_locks: fcntl dotlock
to be consistent with Postfix's
mailbox_delivery_lock = fcntl, dotlock
I don't know if that will fix the problem or not, but it seems the issue with default dovecot mbox_write_locks should have been deadlock and time out rather than what we see.
Also it seems that dovecot should recover from the 'corrupt' mailbox and fix it rather than just continuing to encounter the same problem.
Any ideas or advice?
-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Fri, 2008-04-11 at 22:52 -0700, Mark Sapiro wrote:
Apr 8 19:07:20 sbh16 dovecot: pop3-login: Login: user=<xxx>, method=PLAIN, rip=72.52.113.36, lip=72.52.113.36, secured
Dovecot read-locks the mbox file here.
Apr 8 19:07:22 sbh16 postfix/local[13745]: D959169046B: to=xxx@example.com>, orig_to=xxx@example.org, relay=local, delay=16, delays=15/0/0/1.1, dsn=2.0.0, status=sent (delivered to mailbox) Apr 8 19:07:22 sbh16 dovecot: POP3(xxx): Disconnected: Logged out top=2/15995, retr=0/0, del=2/2, size=15985
These both happen within the same second, so most likely Postfix correctly waits that Doveocot has finished modifying the mbox file and only then delivers its changes.
Five minutes later when fetchmail comes back, we see
Apr 8 19:12:42 sbh16 dovecot: POP3(xxx): mbox sync: Expunged message reappeared in mailbox /var/spool/mail/xxx (UID 4683 < 4684, seq=2, idx_msgs=0)
seq=2 would indicate that there are now two mails, but there should have been only one now.. I tried this a couple of times myself, but it worked in my tests. I'm using v1.0.13 though, so there's also a chance this has been fixed already.
From now on, as messages are added to the mailbox by Postfix, fetchmail keeps retrieving both old and new messages. i.e. even though dovecot's log says del 1/1 (or del n/n) no messages are actually deleted from the mailbox.
This situation continues until someone accesses the mailbox locally (e.g. via mutt through a local shell) and deletes the old messages.
When this happens, could you put the mbox file through http://dovecot.org/tools/mbox-anonymize.pl and send the result to me, along with dovecot.index and dovecot.index.log files.
Also it seems that dovecot should recover from the 'corrupt' mailbox and fix it rather than just continuing to encounter the same problem.
Yes, it shold.
Timo Sirainen wrote:
On Fri, 2008-04-11 at 22:52 -0700, Mark Sapiro wrote:
From now on, as messages are added to the mailbox by Postfix, fetchmail keeps retrieving both old and new messages. i.e. even though dovecot's log says del 1/1 (or del n/n) no messages are actually deleted from the mailbox.
This situation continues until someone accesses the mailbox locally (e.g. via mutt through a local shell) and deletes the old messages.
When this happens, could you put the mbox file through http://dovecot.org/tools/mbox-anonymize.pl and send the result to me, along with dovecot.index and dovecot.index.log files.
Yes, I will do that.
Here's one additional piece of information which may or may not be relevant. There are actually three users (and three separate mailboxes) whose mail is being retrieved via POP3/fetchmail, but all three users have the same UID so there is only one set of .imap/INBOX/dovecot.* files for all three mailboxes.
This may have been at least part of the problem in the case where the error appeared to be precipitated by simultaneous access from two different remote machines.
-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Sun, 2008-05-04 at 10:45 -0700, Mark Sapiro wrote:
Here's one additional piece of information which may or may not be relevant. There are actually three users (and three separate mailboxes) whose mail is being retrieved via POP3/fetchmail, but all three users have the same UID so there is only one set of .imap/INBOX/dovecot.* files for all three mailboxes.
So you mean all 3 mailboxes share the same index files? That's a sure way to cause all kinds of problems.
Timo Sirainen wrote:
On Sun, 2008-05-04 at 10:45 -0700, Mark Sapiro wrote:
Here's one additional piece of information which may or may not be relevant. There are actually three users (and three separate mailboxes) whose mail is being retrieved via POP3/fetchmail, but all three users have the same UID so there is only one set of .imap/INBOX/dovecot.* files for all three mailboxes.
So you mean all 3 mailboxes share the same index files? That's a sure way to cause all kinds of problems.
Yes, that's what I mean. I didn't realize this was the case until just now when I looked at exactly which log files I would send if I saw a recurrence. I think they have the same index files because they all have the same home directory which is an artifact from an old set up. This could easily be changed.
-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Mark Sapiro
-
Timo Sirainen