[Dovecot] Maildir + NFS + multiple machines = spectacular failure

Apps Lists appslists at neonova.net
Tue May 2 18:06:05 EEST 2006


Hate to top-post... but... ah well.

I'm trying the latest CVS snapshot with GFS and am only seeing a couple of
(hopefully) harmless errors...

fcntl() failed with file
/var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or
directory

fcntl() failed with file
/var/mailstore/72/af/375887/Maildir/dovecot.index.cache: No such file or
directory

Timo - Is it safe to ignore these fcntl() errors, or is there something
else going on?

Rich - it might be a good idea to pull down the latest CVS and see if that
resolves some or all of your NFS issues.

Steve



> Hi Rich.
>
> Glad to see that helped out.  I can still introduce errors with IMAP
> even with that setting.
>
> We just finished setting up GFS and the results are the same:
>
> May  1 11:33:11 dev4 dovecot: IMAP(jtest): fcntl() failed with file
> /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or
> directory
> May  1 11:33:13 dev4 dovecot: IMAP(jtest): Corrupted transaction log
> file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append with
> UID 2703, but next_uid = 2704
> May  1 11:33:15 dev4 dovecot: IMAP(jtest): Corrupted transaction log
> file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append with
> UID 2704, but next_uid = 2705
> May  1 11:35:16 dev4 dovecot: IMAP(jtest): fcntl() failed with file
> /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or
> directory
> May  1 11:35:16 dev4 dovecot: IMAP(jtest): file mail-index.c: line 865
> (mail_index_sync_from_transactions): assertion failed: (prev_seq <=
> max_seq && (prev_seq != max_seq || prev_offset <= max_offset))
> ....
> May  1 11:40:19 dev4 dovecot: IMAP(jtest): file_dotlock_open() failed
> with file /var/mailstore/72/af/375887/Maildir/dovecot.index.log:
> Resource temporarily unavailable
> May  1 11:40:19 dev4 dovecot: child 25918 (imap) killed with signal 11
>
> I'm not entirely sure why I'm seeing file_dotlock_open() calls.  We're
> using fcntl.
>
> For the record:  Three members of the cluster. Two machines are running
> Dovecot, mounting the filesystem with GFS.  I have another random
> machine that connects to both machines with IMAP.  The program connects,
> gets a list of folders and APPENDs a message.  It then disconnects and
> repeats.  I run one instance of this test against dovecot#1 and another
> against dovecot#2, and can get the above failure messages within
> seconds.
>
> This is pretty much the same behaviour we've seen with NFS.
>
> Steve
>
>
>> Adding "noac" to our /etc/fstab NFS mounts seems to have helped our
>> index errors so far:
>>
>> nfsserver:/var/mail      /var/mail     nfs
>> rw,hard,intr,rsize=8192,wsize=8192,noac 0 0
>>
>> Over the weekend we had 1,465,031 POP3 sessions across 3 servers, and
>> only logged 15 "Corrupted" errors. They were of these types:
>>
>> Corrupted transaction log file /var/mail/(user)/Maildir/
>> dovecot.index.log: unexpected end of file while reading header
>> Corrupted transaction log file /var/mail/(user)/Maildir/
>> dovecot.index.log: end_offset (236) > current sync_offset (200)
>> Corrupted transaction log file /var/mail/(user)/Maildir/
>> dovecot.index.log: record size too small (type=0x8080, offset=7260,
>> size=0)
>> Corrupted transaction log file /var/mail/(user)/Maildir/
>> dovecot.index.log: hdr.size too large (0)
>>
>> We also had 1,929 IMAP sessions from our testing support staff, and
>> only 2 errors:
>>
>> Corrupted transaction log file /var/mail/(user)/Maildir/
>> dovecot.index.log: record size too small (type=0x5ef9, offset=200,
>> size=0)
>> Corrupted transaction log file /var/mail/(user)/Maildir/.Junk E-mail/
>> dovecot.index.log: end_offset (15292) > current sync_offset (15236)
>>
>> Rich
>>
>> On Apr 28, 2006, at 11:27 AM, Apps Lists wrote:
>>
>>> Rich,
>>>
>>>   We're also in the process of setting up a GFS/Cluster with
>>> CentOS.  I'm
>>> hopeful this will eliminate some of the issues.
>>>
>>>   I'll keep everyone posted.
>>>
>>> Steve
>>>
>>>> Hi Steve,
>>>>
>>>> We'll give that a try.  I'm sure users would prefer a slightly
>>>> slower mail experience if it ensured accurate mail lists and reduced
>>>> the chance for lost mail.
>>>>
>>>> Rich Sandberg
>>>> richs at whidbey.net
>>>> Systems & Network Administrator
>>>> Whidbey Telecom Network Operations
>>>>
>>>>
>>>>
>>>> On Apr 28, 2006, at 9:37 AM, Apps Lists wrote:
>>>>
>>>>> Hi Rich.
>>>>>
>>>>> I've seen a lot of the same IMAP errors... haven't played with POP
>>>>> yet.
>>>>>
>>>>> You might want to try adding "noac" to the mount options... your
>>>>> line..
>>>>>
>>>>> nfshost:/var/mail      /var/mail     nfs
>>>>> rw,hard,intr,rsize=8192,wsize=8192 0 0
>>>>>
>>>>> .... would become:
>>>>>
>>>>>
>>>>> nfshost:/var/mail      /var/mail     nfs
>>>>> rw,hard,intr,rsize=8192,wsize=8192,noac 0 0
>>>>>
>>>>>
>>>>> .... and see if that helps.  I haven't seen much if any difference.
>>>>>
>>>>> What I don't understand is how some people are running what appears
>>>>> to be
>>>>> the same configuration ... RHEL/CentOS ... NFS3, 2+ Dovecot
>>>>> machines with
>>>>> IMAP and have no problems.
>>>>>
>>>>> I'd love to get to the bottom of this.
>>>>>
>>>>> Steve
>>>>>
>>>>>> We're using 2.6.9-34, the latest RHEL4 kernel, over NFSv3 with 3
>>>>>> Dovecot servers. Our most concerning episode was a failed APPEND
>>>>>> operation (with a set of folders containing 200 messages):
>>>>>>
>>>>>> IMAP(username): Corrupted transaction log file /var/mail/username/
>>>>>> Maildir/dovecot.index.log: unexpected end of file while reading
>>>>>> header
>>>>>>
>>>>>> That resulted in one of the new APPENDed mail folders and content
>>>>>> being lost.  We see those errors at other times too, although they
>>>>>> don't always end in a noticeable failure.
>>>>>>
>>>>>> Continuous other errors we see that might be NFS related:
>>>>>>
>>>>>> IMAP(username): close() failed with index cache file /var/mail/
>>>>>> username/Maildir/.Deleted Messages/dovecot.index.cache: Stale
>>>>>> NFS file
>>>>>> handle
>>>>>> IMAP(username): Corrupted index cache file /var/mail/username/
>>>>>> Maildir/.OldMail/dovecot.index.cache: invalid record size
>>>>>> IMAP(username): close() failed with index file /var/mail/username/
>>>>>> Maildir/.INBOX.Security/dovecot.index: Bad file descriptor
>>>>>>
>>>>>> We also see errors with POP3, which would rarely benefit from
>>>>>> indexes
>>>>>> (since 92% of our users RETR-and-DELE messages):
>>>>>>
>>>>>> POP3(username): file_dotlock_create() failed with file /var/mail/
>>>>>> username/Maildir/dovecot.index.log: Stale NFS file handle
>>>>>> POP3(username): Corrupted transaction log file /var/mail/username/
>>>>>> Maildir/dovecot.index.log: record size too small (type=0x8080,
>>>>>> offset=9056, size=0)
>>>>>> POP3(username): Our dotlock file /var/mail/username/Maildir/
>>>>>> dovecot.index.log.lock was deleted (kept it -118 secs)
>>>>>> POP3(username): Timeout while waiting for release of dotlock for
>>>>>> transaction log file /var/mail/username/Maildir/dovecot.index.log
>>>>>>
>>>>>> We've tried manually removing "dovecot*" from all folders,
>>>>>> allowing the indexes to be re-built, but the errors return.
>>>>>>
>>>>>> Excerpt from our config:
>>>>>>
>>>>>> mail_cache_fields mail_never_cache_fields
>>>>>> mail_cache_min_mail_count
>>>>>>>> 1000 mailbox_idle_check_interval = 10
>>>>>>>> mail_full_filesystem_access
>>>>>> = no
>>>>>> mail_read_mmaped = no
>>>>>> mmap_disable = yes
>>>>>> mmap_no_write = no
>>>>>> lock_method = dotlock
>>>>>> maildir_stat_dirs = yes
>>>>>> maildir_copy_with_hardlinks = no
>>>>>>
>>>>>> Our NFS mount:
>>>>>>
>>>>>> nfshost:/var/mail      /var/mail     nfs
>>>>>> rw,hard,intr,rsize=8192,wsize=8192 0 0
>>>>>>
>>>>>> I've read some of the list archives about disabling the index
>>>>>> files.
>>>>>> We realize that indexes should be beneficial, but we'd like to at
>>>>>> least experiment with a "no index" install.  This would solve some
>>>>>> of
>>>>>> the NFS locking issues, eliminate the need to modify our MTA
>>>>>> behavior, and remove (potentially) millions of files from the
>>>>>> system.
>>>>>>
>>>>>> Did anyone discover a method to cleanly disable the indexes? At
>>>>>> least
>>>>>> for POP3?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Rich Sandberg
>>>>>> richs at whidbey.net
>>>>>>
>>>>>>
>>>>>> On Apr 28, 2006, at 6:59 AM, Luis Melendez wrote:
>>>>>>
>>>>>>> Hi all.
>>>>>>> We also had problems with  client NFS locking using RHEL4, kernel
>>>>>>> 2.6.9
>>>>>>> Some of them were solved updating the kelnel from RedHat.
>>>>>>> Hope this helps.
>>>>>>>
>>>>>>> Regards.
>>>>>>>
>>>>>>> Apps Lists wrote:
>>>>>>>
>>>>>>>> I'm running beta7 on two machines, with maildir on NFS.  I have
>>>>>>>> lockd
>>>>>>>> running on all machines.  I've found that Dovecot is highly
>>>>>>>> unstable with
>>>>>>>> NFS when accessing a mailbox on more than one machine at the
>>>>>>>> same time.
>>>>>>>>
>>>>>>>> Both dovecot machines have:
>>>>>>>>
>>>>>>>> mmap_disable = yes
>>>>>>>> lock_method = fcntl
>>>>>>>>
>>>>>>>> NFS is version 3, exported from a third linux machine.  All
>>>>>>>> machines are
>>>>>>>> running 2.6.9 kernel.
>>>>>>>>
>>>>>>>> Any ideas what's going wrong here?
>>>>>>>>
>>>>>>>> Steve
>>>>>>>>
>>>>>>>>
>>>>>>>> *** dev4 ***
>>>>>>>>
>>>>>>>> Apr 28 09:17:40 dev4 dovecot: IMAP(jtest): Duplicate file in
>>>>>>>> uidlist file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist:
>>>>>>>> 1146230258.P2889Q0M407851.dev4.neonova.net:2,
>>>>>>>>
>>>>>>>> Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): Corrupted transaction
>>>>>>>> log file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension
>>>>>>>> introduction for unknown id 0
>>>>>>>>
>>>>>>>> Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): file mail-index.c:
>>>>>>>> line
>>>>>>>> 883
>>>>>>>> (mail_index_sync_from_transactions): assertion failed:
>>>>>>>> (hdr.log_file_int_offset == (*map)->hdr.log_file_int_offset)
>>>>>>>>
>>>>>>>>
>>>>>>>> Apr 28 09:17:41 dev4 dovecot: child 2889 (imap) killed with
>>>>>>>> signal 6
>>>>>>>>
>>>>>>>>
>>>>>>>> *** dev3 ***
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Apr 28 09:17:39 dev3 dovecot: IMAP(jtest): Duplicate file in
>>>>>>>> uidlist file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist:
>>>>>>>> 1146230254.P2888Q0M651598.dev4.neonova.net
>>>>>>>>
>>>>>>>> Apr 28 09:17:41 dev3 dovecot: IMAP(jtest): Fixed index file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index:
>>>>>>>> first_recent_uid_lowwater 4036 -> 1772
>>>>>>>>
>>>>>>>> Apr 28 09:17:44 dev3 dovecot: IMAP(jtest):
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid
>>>>>>>> was lowered
>>>>>>>> (4036 -> 1774)
>>>>>>>>
>>>>>>>> Apr 28 09:17:45 dev3 dovecot: IMAP(jtest): Duplicate file in
>>>>>>>> uidlist file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist:
>>>>>>>> 1146145480.P5976Q0M245663.dev3.neonova.net:2,
>>>>>>>>
>>>>>>>> Apr 28 09:17:46 dev3 dovecot: IMAP(jtest): Fixed index file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index:
>>>>>>>> first_recent_uid_lowwater 4038 -> 1774
>>>>>>>>
>>>>>>>> Apr 28 09:17:47 dev3 dovecot: IMAP(jtest):
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid
>>>>>>>> was lowered
>>>>>>>> (4038 -> 1775)
>>>>>>>>
>>>>>>>> Apr 28 09:17:48 dev3 dovecot: IMAP(jtest): Duplicate file in
>>>>>>>> uidlist file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist:
>>>>>>>> 1146145480.P5976Q0M245663.dev3.neonova.net:2,
>>>>>>>>
>>>>>>>> Apr 28 09:17:49 dev3 dovecot: IMAP(jtest):
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid
>>>>>>>> was lowered
>>>>>>>> (4039 -> 1775)
>>>>>>>>
>>>>>>>> Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): Corrupted transaction
>>>>>>>> log file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension
>>>>>>>> introduction for unknown id 0
>>>>>>>>
>>>>>>>> Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): file mail-index.c:
>>>>>>>> line
>>>>>>>> 881
>>>>>>>> (mail_index_sync_from_transactions): assertion failed:
>>>>>>>> (hdr.messages_count
>>>>>>>> == (*map)->hdr.messages_count)
>>>>>>>>
>>>>>>>> Apr 28 09:17:50 dev3 dovecot: IMAP(jtest):
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid
>>>>>>>> was lowered
>>>>>>>> (4039 -> 1775)
>>>>>>>>
>>>>>>>> Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction
>>>>>>>> log file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append
>>>>>>>> with
>>>>>>>> UID
>>>>>>>> 1775, but next_uid = 4039
>>>>>>>>
>>>>>>>> Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction
>>>>>>>> log file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append
>>>>>>>> with
>>>>>>>> UID
>>>>>>>> 1775, but next_uid = 4039
>>>>>>>>
>>>>>>>> Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Unexpected
>>>>>>>> transaction log
>>>>>>>> desync with index /var/mailstore/72/af/375887/Maildir/
>>>>>>>> dovecot.index
>>>>>>>>
>>>>>>>> Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Disconnected:
>>>>>>>> Mailbox   is
>>>>>>>> in
>>>>>>>> inconsistent state, please relogin.
>>>>>>>>
>>>>>>>> Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Fixed index file
>>>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index:
>>>>>>>> first_recent_uid_lowwater 4039 -> 1776
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> +----------------------------------------------
>>>>>>> ^-----------------------+
>>>>>>> | Luis Meléndez Aganzo                         ^  Email:
>>>>>>> luism at uco.es  |
>>>>>>> | Servicio de Informática                      ^  Tlf: 34-(9)
>>>>>>> 57-211022 |
>>>>>>> | Analista - Área de Sistemas                  ^  Fax: 34-(9)
>>>>>>> 57-218116 |
>>>>>>> | Universidad de Córdoba (SPAIN)               ^  http://
>>>>>>> www.uco.es    |
>>>>>>> +----------------------------------------------
>>>>>>> ^-----------------------+





More information about the dovecot mailing list