[Dovecot] Problem in mbox-sync.c
We've just cut over to a new mail server running Dovecot 1.0.5.
The underlying OS is RHEL 5. User mailboxes are stored in Unix mbox
format on a local ext3 file system. The MTA on the system is the
default RedHat version of Sendmail 8.13.8 with procmail for local
delivery. We're using a combination of dotlock and fcntl style locking.
The output of "dovecot -n" for this system appears below my signature.
Periodically we see errors like the following in our logs:
Oct 21 19:36:01 postoffice1 dovecot: POP3(someuser): file mbox-sync.c: line 1433 (mbox_sync_handle_eof_updates): assertion failed: (file_size >= sync_ctx->expunged_space + trailer_size) Oct 21 19:36:01 postoffice1 dovecot: POP3(someuser): Raw backtrace: pop3 [0x809eda1] -> pop3 [0x809ecbc] -> pop3(mbox_sync+0x1088) [0x806c4e8] -> pop3(mbox_storage_sync_init+0x3e) [0x806cb6e] -> pop3(client_create+0x1a4) [0x80552a4] -> pop3(main+0x556) [0x8057236] -> /lib/libc.so.6(__libc_start_main+0xdc) [0x304dec] -> pop3 [0x8054a11] Oct 21 19:36:01 postoffice1 dovecot: child 24409 (pop3) killed with signal 6
Note that while the above log is for a POP3 user, we also see similar behavior when some users access mail folders via IMAP as well.
The real problem with this behavior is that sometimes, but not in all cases, the process will leave behind a dotlock when it exits. If the abandoned dotlock is on the user's inbox, then procmail won't deliver new mail to the user until the lock is cleared. This is a big problem.
Anybody seen this behavior before and/or know what's causing it? Can anybody suggest ideas for further debugging this issue? We'd appreciate any help that you all can provide. Thanks!
-- Hal Pomeranz, Founder/CEO Deer Run Associates hal@deer-run.com Network Connectivity and Security, Systems Management, Training
# /hpsw/mail/dovecot/sbin/dovecot -n # 1.0.5: /hpsw/mail/dovecot/etc/dovecot.conf syslog_facility: local3 protocols: imap imaps pop3 pop3s ssl_cert_file: /hpsw/mail/configs/dovecot/pki/certs/dovecot.pem ssl_key_file: /hpsw/mail/configs/dovecot/pki/private/dovecot.pem disable_plaintext_auth: no login_dir: /hpsw/mail/dovecot/var/run/dovecot/login login_executable(default): /hpsw/mail/dovecot/libexec/dovecot/imap-login login_executable(imap): /hpsw/mail/dovecot/libexec/dovecot/imap-login login_executable(pop3): /hpsw/mail/dovecot/libexec/dovecot/pop3-login mail_location: mbox:/hpsw/mail/users/%u/dovecot:INBOX=/hpsw/mail/users/%u/MBOX mbox_read_locks: dotlock fcntl mail_executable(default): /hpsw/mail/dovecot/libexec/dovecot/imap mail_executable(imap): /hpsw/mail/dovecot/libexec/dovecot/imap mail_executable(pop3): /hpsw/mail/dovecot/libexec/dovecot/pop3 mail_plugin_dir(default): /hpsw/mail/dovecot/lib/dovecot/imap mail_plugin_dir(imap): /hpsw/mail/dovecot/lib/dovecot/imap mail_plugin_dir(pop3): /hpsw/mail/dovecot/lib/dovecot/pop3 pop3_uidl_format(default): pop3_uidl_format(imap): pop3_uidl_format(pop3): %08Xu%08Xv auth default: passdb: driver: pam userdb: driver: passwd
On 22.10.2007, at 5.46, Hal Pomeranz wrote:
Oct 21 19:36:01 postoffice1 dovecot: POP3(someuser): file mbox- sync.c: line 1433 (mbox_sync_handle_eof_updates): assertion failed:
(file_size >= sync_ctx->expunged_space + trailer_size)
Does the file have CRLFs as linefeeds instead of plain LFs? CRLF
handling is probably still a bit buggy.
If that's not the problem then I'm not really sure.. It's difficult
to do anything about this unless I can reproduce the problem. I've
been using mboxes for my own mails fine for years. So if you can
figure out a way to easily reproduce this, I'd like to know.
If this happens every time for a specific mbox, could you put it
through http://dovecot.org/tools/mbox-anonymize.pl and then send the
output to me and also the mailbox's dovecot.index and
dovecot.index.log files?
The real problem with this behavior is that sometimes, but not in all cases, the process will leave behind a dotlock when it exits. If the abandoned dotlock is on the user's inbox, then procmail won't deliver new mail to the user until the lock is cleared. This is a big
problem.
Dovecot writes the process's PID to the dotlock file. Procmail would
be able to check if the PID still exists and override the dotlock
immediately if it does. But since it doesn't do this, I guess you'd
have to modify its sources or create some wrapper script to delete
stale dotlocks.
Oct 21 19:36:01 postoffice1 dovecot: POP3(someuser): file mbox- sync.c: line 1433 (mbox_sync_handle_eof_updates): assertion failed:
(file_size >= sync_ctx->expunged_space + trailer_size)Does the file have CRLFs as linefeeds instead of plain LFs? CRLF
handling is probably still a bit buggy.
The mbox files are standard Unix text, so LF only, right? We haven't turned on the CRLF options in dovecot.conf, that's for sure.
If that's not the problem then I'm not really sure.. It's difficult
to do anything about this unless I can reproduce the problem. I've
been using mboxes for my own mails fine for years. So if you can
figure out a way to easily reproduce this, I'd like to know.
We've done some more debugging since my original posting. The way to make the problem happen reliably seems to be to attempt to delete all messages from the mbox. Thus, users who POP all their mail off the server are affected, as are users who are using an IMAP client to purge all their email from a given folder. This problem does not seem to be related to a specific email client, since we can make the problem happen with different client programs.
But there must be some underlying problem with our setup somehow. I can't believe that Dovecot's code for mailbox truncation is somehow inherently flawed or some other site surely would have noticed this problem before now. A file system buffering issue? I'm not sure.
I'm open to suggestions on this problem. If anybody else can reproduce the issue, I'd be interested to hear about it, but in the meantime I'm going to assume that something's hosed with our specific configuration.
The real problem with this behavior is that sometimes, but not in all cases, the process will leave behind a dotlock when it exits. If the abandoned dotlock is on the user's inbox, then procmail won't deliver new mail to the user until the lock is cleared. This is a big
problem.Dovecot writes the process's PID to the dotlock file. Procmail would
be able to check if the PID still exists and override the dotlock
immediately if it does. But since it doesn't do this, I guess you'd
have to modify its sources or create some wrapper script to delete
stale dotlocks.
Right. We've been using the PIDs to manually decide which dotlocks are stranded, but as you say getting procmail to do the right thing would require source code hacks. And the reality is that this wouldn't fix the underlying issue, just sort of "paper over" the problem. I'd prefer to fix the root cause.
-- Hal Pomeranz, Founder/CEO Deer Run Associates hal@deer-run.com Network Connectivity and Security, Systems Management, Training
Oct 21 19:36:01 postoffice1 dovecot: POP3(someuser): file mbox- sync.c: line 1433 (mbox_sync_handle_eof_updates): assertion failed:
(file_size >= sync_ctx->expunged_space + trailer_size)Does the file have CRLFs as linefeeds instead of plain LFs? CRLF
handling is probably still a bit buggy.The mbox files are standard Unix text, so LF only, right? We haven't turned on the CRLF options in dovecot.conf, that's for sure.
Well, surprise, surprise. The file does have CR-LF line endings, and stripping out the CRs seems to make the problem go away.
We imported the mailbox data from our old mail servers which were FreeBSD boxes running a really old version of Cyrus. For some reason the messages in users' Cyrus mailboxes had CR-LF instead of the standard Unix LF line endings. Wacky.
Anyway, thanks for the pointers on this. I'm happy that it turned out to be something mundane and easily fixable, even though we've burned a bunch of time beating our heads against the issue.
-- Hal Pomeranz, Founder/CEO Deer Run Associates hal@deer-run.com Network Connectivity and Security, Systems Management, Training
Oct 21 19:36:01 postoffice1 dovecot: POP3(someuser): file mbox- sync.c: line 1433 (mbox_sync_handle_eof_updates): assertion failed:
(file_size >= sync_ctx->expunged_space + trailer_size)Does the file have CRLFs as linefeeds instead of plain LFs? CRLF
handling is probably still a bit buggy.The mbox files are standard Unix text, so LF only, right? We haven't turned on the CRLF options in dovecot.conf, that's for sure.
Well, surprise, surprise. The file does have CR-LF line endings, and stripping out the CRs seems to make the problem go away.
I've been thinking about this a little more since yesterday. Is there a reason why Dovecot shouldn't simply convert CR-LF line endings to LF on the fly, or at least have a configuration option you can turn on that will do this? This seems preferrable to blowing up with the error cited above...
-- Hal Pomeranz, Founder/CEO Deer Run Associates hal@deer-run.com Network Connectivity and Security, Systems Management, Training
On 23.10.2007, at 18.25, Hal Pomeranz wrote:
I've been thinking about this a little more since yesterday. Is there a reason why Dovecot shouldn't simply convert CR-LF line endings to LF on the fly, or at least have a configuration option you can turn on that will do this? This seems preferrable to blowing up with the
error cited above...
I think removing CRs wouldn't be very nice, and it would require more
code that could be just as buggy. The correct way to handle this
would be to just fix it to work with CRs. Most of the code already is
there, there are probably just a few places where the handling is
wrong. But since the workaround is easy and I've a lot other more
important things in my TODO list, things are as they are..
On Tuesday October 23, 2007 at 11:25:03 (AM) Hal Pomeranz wrote:
Oct 21 19:36:01 postoffice1 dovecot: POP3(someuser): file mbox- sync.c: line 1433 (mbox_sync_handle_eof_updates): assertion failed:
(file_size >= sync_ctx->expunged_space + trailer_size)Does the file have CRLFs as linefeeds instead of plain LFs? CRLF
handling is probably still a bit buggy.The mbox files are standard Unix text, so LF only, right? We haven't turned on the CRLF options in dovecot.conf, that's for sure.
Well, surprise, surprise. The file does have CR-LF line endings, and stripping out the CRs seems to make the problem go away.
I've been thinking about this a little more since yesterday. Is there a reason why Dovecot shouldn't simply convert CR-LF line endings to LF on the fly, or at least have a configuration option you can turn on that will do this? This seems preferrable to blowing up with the error cited above...
Maybe I am misunderstanding this thread, but aren't CRLF specifically mentioned in RFC822 and other RFCs?
-- Gerard
On Tue, 2007-10-23 at 14:18 -0400, Gerard wrote:
I've been thinking about this a little more since yesterday. Is there a reason why Dovecot shouldn't simply convert CR-LF line endings to LF on the fly, or at least have a configuration option you can turn on that will do this? This seems preferrable to blowing up with the error cited above...
Maybe I am misunderstanding this thread, but aren't CRLF specifically mentioned in RFC822 and other RFCs?
It's mentioned in SMTP, IMAP, etc. RFCs, but nothing requires newlines to be stored as CRLF in disk. They just need to be sent as CRLFs to IMAP/POP3 clients (which Dovecot does).
participants (3)
-
Gerard
-
Hal Pomeranz
-
Timo Sirainen