[Dovecot] mbox sync/lock issue, rc28 and later
Help!
I just noticed this error this morning, we have been getting a lot of these in rc28 and rc29, not in rc27 or before:
Apr 6 07:22:52 karst dovecot: [ID 107833 mail.error] IMAP(jaearick): mbox file /var/mail/j/jaearick was modified while we were syncing, check your locking settings
Pine coughed up an internal error, no core dump, and dovecot basically doubled the size of my mailbox by duplicating all of my messages again. What changed with file locking between rc27 and rc28 to cause this? How to fix?
My setup: dovecot server is Solaris 10, with ZFS for home directories (where index files go). The mail server is also Solaris 10, /var/mail NFS mounted to the dovecot server (NFSv4, yes lockd is running). The /var/mail filesystem on the mail server is also ZFS.
Here is the output of "dovecot -n":
# ./dovecot -n # /opt/dovecot.1.0.rc29/etc/dovecot.conf base_dir: /var/run/dovecot/ ssl_listen: * ssl_cert_file: /opt/openssl/ssl/certs/colby0.cert ssl_key_file: /opt/openssl/ssl/private/colby0.key.nopasswd login_dir: /var/run/dovecot//login login_executable: /opt/dovecot.1.0.rc29/libexec/dovecot/imap-login login_process_per_connection: no login_processes_count: 8 first_valid_uid: 100 first_valid_gid: 100 mail_location: mbox:%h/mail:INBOX=/var/mail/%1u/%u:INDEX=%h mail_drop_priv_before_exec: yes auth default: passdb: driver: passwd-file args: /etc/dovecot.deny deny: yes passdb: driver: pam userdb: driver: passwd
Jeff Earickson Colby College
On 6.4.2007, at 15.59, Jeff A. Earickson wrote:
Apr 6 07:22:52 karst dovecot: [ID 107833 mail.error] IMAP (jaearick): mbox file /var/mail/j/jaearick was modified while we
were syncing, check your locking settingsPine coughed up an internal error, no core dump, and dovecot basically doubled the size of my mailbox by duplicating all of my messages again. What changed with file locking between rc27 and
rc28 to cause this? How to fix?
I added this check to rc28. Hmm. I guess it's possible that the check
doesn't work that great with NFS.
But what do you mean duplicated the messages? Same messages exist
multiple times in the mbox file? The only reason I can think of how
that could happen is if you expunged a lot of data from the middle of
the mbox, and then in the middle of the move that external
modification check cancelled the sync.
If that's not the case, maybe it really was something else than
Dovecot that went and duplicated the messages?
On Fri, 6 Apr 2007, Timo Sirainen wrote:
Date: Fri, 6 Apr 2007 19:50:07 +0300 From: Timo Sirainen tss@iki.fi Reply-To: Dovecot Mailing List dovecot@dovecot.org To: Jeff A. Earickson jaearick@colby.edu Cc: dovecot@dovecot.org Subject: Re: [Dovecot] mbox sync/lock issue, rc28 and later
On 6.4.2007, at 15.59, Jeff A. Earickson wrote:
Apr 6 07:22:52 karst dovecot: [ID 107833 mail.error] IMAP(jaearick): mbox file /var/mail/j/jaearick was modified while we were syncing, check your locking settings
Pine coughed up an internal error, no core dump, and dovecot basically doubled the size of my mailbox by duplicating all of my messages again. What changed with file locking between rc27 and rc28 to cause this? How to fix?
I added this check to rc28. Hmm. I guess it's possible that the check doesn't work that great with NFS.
I went back and read the wiki on mbox and locking to see what I might have done wrong, eg: http://wiki.dovecot.org/MailboxFormat/mbox. Procmail on my mail server (the LDA for sendmail there) uses dotlocking, fcntl, lockf for its lock setting -- this hasn't been changed in a looooong time.
But what do you mean duplicated the messages? Same messages exist multiple times in the mbox file? The only reason I can think of how that could happen is if you expunged a lot of data from the middle of the mbox, and then in the middle of the move that external modification check cancelled the sync.
That seems to be what happened. I had marked a bunch of messages as deleted, and had done "expunge" in Pine to trash them. Then pine gave its internal error. Afterwords, most of the deleted messages were gone, some at the end were not (not marked as deleted), and about 20 messages in the middle of the mailbox were duplicated again at the end. I think a new message came in too. Except for the duplication and the undeletes, no messages were mangled. I just had to re-delete some messages.
After this odd behavior, I went looking for a core file (none), and I checked my syslogs. That is when I found the pile of messages about "modified while syncing". I found this message 1161 times in my syslogs since March 28, 50 times today. Nobody has complained about anything though.
If that's not the case, maybe it really was something else than Dovecot that went and duplicated the messages?
I only read my email via an IMAP connection with Pine. Thats it.
Jeff Earickson Colby College
On 6.4.2007, at 20.08, Jeff A. Earickson wrote:
But what do you mean duplicated the messages? Same messages exist
multiple times in the mbox file? The only reason I can think of
how that could happen is if you expunged a lot of data from the
middle of the mbox, and then in the middle of the move that
external modification check cancelled the sync.That seems to be what happened. I had marked a bunch of messages
as deleted, and had done "expunge" in Pine to trash them. Then pine gave its
internal error. Afterwords, most of the deleted messages were gone, some at
the end were not (not marked as deleted), and about 20 messages in the
middle of the mailbox were duplicated again at the end. I think a new
message came in too. Except for the duplication and the undeletes, no messages
were mangled. I just had to re-delete some messages.After this odd behavior, I went looking for a core file (none), and
I checked my syslogs. That is when I found the pile of messages about
"modified while syncing". I found this message 1161 times in my syslogs since
March 28, 50 times today. Nobody has complained about anything though.
I guess this has something to do with NFS attribute caching then. For
now you could disable the check by changing
mbox_sync_file_is_ext_modified() to just contain "return FALSE;" in
mbox-sync.c.
Hmm. Wonder if I should add a new setting for this or just disable
the check completely for v1.0.. The reason why I added it in the
first place was because I wanted to know if some mbox problems were
caused by locking problems or real bugs in Dovecot. If the settings
are correct it's a bit useless to do this check.
On 6.4.2007, at 20.36, Timo Sirainen wrote:
I guess this has something to do with NFS attribute caching then.
For now you could disable the check by changing
mbox_sync_file_is_ext_modified() to just contain "return FALSE;" in
mbox-sync.c.Hmm. Wonder if I should add a new setting for this or just disable
the check completely for v1.0.. The reason why I added it in the
first place was because I wanted to know if some mbox problems were
caused by locking problems or real bugs in Dovecot. If the settings
are correct it's a bit useless to do this check.
This makes it give the error only if it notices a problem in the mbox
file: http://dovecot.org/list/dovecot-cvs/2007-April/008615.html
So I guess I'll do rc31 still this weekend. :)
participants (2)
-
Jeff A. Earickson
-
Timo Sirainen