[Dovecot] Maildir + NFS + multiple machines = spectacular failure

richs at whidbey.net richs at whidbey.net
Mon May 1 18:05:51 EEST 2006


Adding "noac" to our /etc/fstab NFS mounts seems to have helped our  
index errors so far:

nfsserver:/var/mail      /var/mail     nfs      
rw,hard,intr,rsize=8192,wsize=8192,noac 0 0

Over the weekend we had 1,465,031 POP3 sessions across 3 servers, and  
only logged 15 "Corrupted" errors. They were of these types:

Corrupted transaction log file /var/mail/(user)/Maildir/ 
dovecot.index.log: unexpected end of file while reading header
Corrupted transaction log file /var/mail/(user)/Maildir/ 
dovecot.index.log: end_offset (236) > current sync_offset (200)
Corrupted transaction log file /var/mail/(user)/Maildir/ 
dovecot.index.log: record size too small (type=0x8080, offset=7260,  
size=0)
Corrupted transaction log file /var/mail/(user)/Maildir/ 
dovecot.index.log: hdr.size too large (0)

We also had 1,929 IMAP sessions from our testing support staff, and  
only 2 errors:

Corrupted transaction log file /var/mail/(user)/Maildir/ 
dovecot.index.log: record size too small (type=0x5ef9, offset=200,  
size=0)
Corrupted transaction log file /var/mail/(user)/Maildir/.Junk E-mail/ 
dovecot.index.log: end_offset (15292) > current sync_offset (15236)

Rich

On Apr 28, 2006, at 11:27 AM, Apps Lists wrote:

> Rich,
>
>   We're also in the process of setting up a GFS/Cluster with  
> CentOS.  I'm
> hopeful this will eliminate some of the issues.
>
>   I'll keep everyone posted.
>
> Steve
>
>> Hi Steve,
>>
>> We'll give that a try.  I'm sure users would prefer a slightly slower
>> mail experience if it ensured accurate mail lists and reduced the
>> chance for lost mail.
>>
>> Rich Sandberg
>> richs at whidbey.net
>> Systems & Network Administrator
>> Whidbey Telecom Network Operations
>>
>>
>>
>> On Apr 28, 2006, at 9:37 AM, Apps Lists wrote:
>>
>>> Hi Rich.
>>>
>>> I've seen a lot of the same IMAP errors... haven't played with POP
>>> yet.
>>>
>>> You might want to try adding "noac" to the mount options... your
>>> line..
>>>
>>> nfshost:/var/mail      /var/mail     nfs
>>> rw,hard,intr,rsize=8192,wsize=8192 0 0
>>>
>>> .... would become:
>>>
>>>
>>> nfshost:/var/mail      /var/mail     nfs
>>> rw,hard,intr,rsize=8192,wsize=8192,noac 0 0
>>>
>>>
>>> .... and see if that helps.  I haven't seen much if any difference.
>>>
>>> What I don't understand is how some people are running what appears
>>> to be
>>> the same configuration ... RHEL/CentOS ... NFS3, 2+ Dovecot
>>> machines with
>>> IMAP and have no problems.
>>>
>>> I'd love to get to the bottom of this.
>>>
>>> Steve
>>>
>>>> We're using 2.6.9-34, the latest RHEL4 kernel, over NFSv3 with 3
>>>> Dovecot servers. Our most concerning episode was a failed APPEND
>>>> operation (with a set of folders containing 200 messages):
>>>>
>>>> IMAP(username): Corrupted transaction log file /var/mail/username/
>>>> Maildir/dovecot.index.log: unexpected end of file while reading
>>>> header
>>>>
>>>> That resulted in one of the new APPENDed mail folders and content
>>>> being lost.  We see those errors at other times too, although they
>>>> don't always end in a noticeable failure.
>>>>
>>>> Continuous other errors we see that might be NFS related:
>>>>
>>>> IMAP(username): close() failed with index cache file /var/mail/
>>>> username/Maildir/.Deleted Messages/dovecot.index.cache: Stale   NFS
>>>> file
>>>> handle
>>>> IMAP(username): Corrupted index cache file /var/mail/username/
>>>> Maildir/.OldMail/dovecot.index.cache: invalid record size
>>>> IMAP(username): close() failed with index file /var/mail/username/
>>>> Maildir/.INBOX.Security/dovecot.index: Bad file descriptor
>>>>
>>>> We also see errors with POP3, which would rarely benefit from  
>>>> indexes
>>>> (since 92% of our users RETR-and-DELE messages):
>>>>
>>>> POP3(username): file_dotlock_create() failed with file /var/mail/
>>>> username/Maildir/dovecot.index.log: Stale NFS file handle
>>>> POP3(username): Corrupted transaction log file /var/mail/username/
>>>> Maildir/dovecot.index.log: record size too small (type=0x8080,
>>>> offset=9056, size=0)
>>>> POP3(username): Our dotlock file /var/mail/username/Maildir/
>>>> dovecot.index.log.lock was deleted (kept it -118 secs)
>>>> POP3(username): Timeout while waiting for release of dotlock for
>>>> transaction log file /var/mail/username/Maildir/dovecot.index.log
>>>>
>>>> We've tried manually removing "dovecot*" from all folders, allowing
>>>> the indexes to be re-built, but the errors return.
>>>>
>>>> Excerpt from our config:
>>>>
>>>> mail_cache_fields mail_never_cache_fields mail_cache_min_mail_count
>>>>>> 1000 mailbox_idle_check_interval = 10 mail_full_filesystem_access
>>>> = no
>>>> mail_read_mmaped = no
>>>> mmap_disable = yes
>>>> mmap_no_write = no
>>>> lock_method = dotlock
>>>> maildir_stat_dirs = yes
>>>> maildir_copy_with_hardlinks = no
>>>>
>>>> Our NFS mount:
>>>>
>>>> nfshost:/var/mail      /var/mail     nfs
>>>> rw,hard,intr,rsize=8192,wsize=8192 0 0
>>>>
>>>> I've read some of the list archives about disabling the index  
>>>> files.
>>>> We realize that indexes should be beneficial, but we'd like to at
>>>> least experiment with a "no index" install.  This would solve  
>>>> some of
>>>> the NFS locking issues, eliminate the need to modify our MTA
>>>> behavior, and remove (potentially) millions of files from the  
>>>> system.
>>>>
>>>> Did anyone discover a method to cleanly disable the indexes? At  
>>>> least
>>>> for POP3?
>>>>
>>>> Thanks!
>>>>
>>>> Rich Sandberg
>>>> richs at whidbey.net
>>>>
>>>>
>>>> On Apr 28, 2006, at 6:59 AM, Luis Melendez wrote:
>>>>
>>>>> Hi all.
>>>>> We also had problems with  client NFS locking using RHEL4, kernel
>>>>> 2.6.9
>>>>> Some of them were solved updating the kelnel from RedHat.
>>>>> Hope this helps.
>>>>>
>>>>> Regards.
>>>>>
>>>>> Apps Lists wrote:
>>>>>
>>>>>> I'm running beta7 on two machines, with maildir on NFS.  I have
>>>>>> lockd
>>>>>> running on all machines.  I've found that Dovecot is highly
>>>>>> unstable with
>>>>>> NFS when accessing a mailbox on more than one machine at the same
>>>>>> time.
>>>>>>
>>>>>> Both dovecot machines have:
>>>>>>
>>>>>> mmap_disable = yes
>>>>>> lock_method = fcntl
>>>>>>
>>>>>> NFS is version 3, exported from a third linux machine.  All
>>>>>> machines are
>>>>>> running 2.6.9 kernel.
>>>>>>
>>>>>> Any ideas what's going wrong here?
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>>
>>>>>> *** dev4 ***
>>>>>>
>>>>>> Apr 28 09:17:40 dev4 dovecot: IMAP(jtest): Duplicate file in
>>>>>> uidlist file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist:
>>>>>> 1146230258.P2889Q0M407851.dev4.neonova.net:2,
>>>>>>
>>>>>> Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): Corrupted transaction
>>>>>> log file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension
>>>>>> introduction for unknown id 0
>>>>>>
>>>>>> Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): file mail-index.c:  
>>>>>> line
>>>>>> 883
>>>>>> (mail_index_sync_from_transactions): assertion failed:
>>>>>> (hdr.log_file_int_offset == (*map)->hdr.log_file_int_offset)
>>>>>>
>>>>>>
>>>>>> Apr 28 09:17:41 dev4 dovecot: child 2889 (imap) killed with
>>>>>> signal 6
>>>>>>
>>>>>>
>>>>>> *** dev3 ***
>>>>>>
>>>>>>
>>>>>>
>>>>>> Apr 28 09:17:39 dev3 dovecot: IMAP(jtest): Duplicate file in
>>>>>> uidlist file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist:
>>>>>> 1146230254.P2888Q0M651598.dev4.neonova.net
>>>>>>
>>>>>> Apr 28 09:17:41 dev3 dovecot: IMAP(jtest): Fixed index file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index:
>>>>>> first_recent_uid_lowwater 4036 -> 1772
>>>>>>
>>>>>> Apr 28 09:17:44 dev3 dovecot: IMAP(jtest):
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was
>>>>>> lowered
>>>>>> (4036 -> 1774)
>>>>>>
>>>>>> Apr 28 09:17:45 dev3 dovecot: IMAP(jtest): Duplicate file in
>>>>>> uidlist file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist:
>>>>>> 1146145480.P5976Q0M245663.dev3.neonova.net:2,
>>>>>>
>>>>>> Apr 28 09:17:46 dev3 dovecot: IMAP(jtest): Fixed index file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index:
>>>>>> first_recent_uid_lowwater 4038 -> 1774
>>>>>>
>>>>>> Apr 28 09:17:47 dev3 dovecot: IMAP(jtest):
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was
>>>>>> lowered
>>>>>> (4038 -> 1775)
>>>>>>
>>>>>> Apr 28 09:17:48 dev3 dovecot: IMAP(jtest): Duplicate file in
>>>>>> uidlist file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist:
>>>>>> 1146145480.P5976Q0M245663.dev3.neonova.net:2,
>>>>>>
>>>>>> Apr 28 09:17:49 dev3 dovecot: IMAP(jtest):
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was
>>>>>> lowered
>>>>>> (4039 -> 1775)
>>>>>>
>>>>>> Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): Corrupted transaction
>>>>>> log file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension
>>>>>> introduction for unknown id 0
>>>>>>
>>>>>> Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): file mail-index.c:  
>>>>>> line
>>>>>> 881
>>>>>> (mail_index_sync_from_transactions): assertion failed:
>>>>>> (hdr.messages_count
>>>>>> == (*map)->hdr.messages_count)
>>>>>>
>>>>>> Apr 28 09:17:50 dev3 dovecot: IMAP(jtest):
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was
>>>>>> lowered
>>>>>> (4039 -> 1775)
>>>>>>
>>>>>> Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction
>>>>>> log file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append  
>>>>>> with
>>>>>> UID
>>>>>> 1775, but next_uid = 4039
>>>>>>
>>>>>> Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction
>>>>>> log file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append  
>>>>>> with
>>>>>> UID
>>>>>> 1775, but next_uid = 4039
>>>>>>
>>>>>> Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Unexpected
>>>>>> transaction log
>>>>>> desync with index /var/mailstore/72/af/375887/Maildir/ 
>>>>>> dovecot.index
>>>>>>
>>>>>> Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Disconnected:
>>>>>> Mailbox   is
>>>>>> in
>>>>>> inconsistent state, please relogin.
>>>>>>
>>>>>> Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Fixed index file
>>>>>> /var/mailstore/72/af/375887/Maildir/dovecot.index:
>>>>>> first_recent_uid_lowwater 4039 -> 1776
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> +----------------------------------------------
>>>>> ^-----------------------+
>>>>> | Luis Meléndez Aganzo                         ^  Email:
>>>>> luism at uco.es  |
>>>>> | Servicio de Informática                      ^  Tlf: 34-(9)
>>>>> 57-211022 |
>>>>> | Analista - Área de Sistemas                  ^  Fax: 34-(9)
>>>>> 57-218116 |
>>>>> | Universidad de Córdoba (SPAIN)               ^  http://
>>>>> www.uco.es    |
>>>>> +----------------------------------------------
>>>>> ^-----------------------+
>
>
>



More information about the dovecot mailing list