Re: [Dovecot] Maildir + NFS + multiple machines = spectacular failure
Rich,
We're also in the process of setting up a GFS/Cluster with CentOS. I'm hopeful this will eliminate some of the issues.
I'll keep everyone posted.
Steve
Hi Steve,
We'll give that a try. I'm sure users would prefer a slightly slower mail experience if it ensured accurate mail lists and reduced the chance for lost mail.
Rich Sandberg richs@whidbey.net Systems & Network Administrator Whidbey Telecom Network Operations
On Apr 28, 2006, at 9:37 AM, Apps Lists wrote:
Hi Rich.
I've seen a lot of the same IMAP errors... haven't played with POP yet.
You might want to try adding "noac" to the mount options... your line..
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192 0 0
.... would become:
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192,noac 0 0
.... and see if that helps. I haven't seen much if any difference.
What I don't understand is how some people are running what appears to be the same configuration ... RHEL/CentOS ... NFS3, 2+ Dovecot machines with IMAP and have no problems.
I'd love to get to the bottom of this.
Steve
We're using 2.6.9-34, the latest RHEL4 kernel, over NFSv3 with 3 Dovecot servers. Our most concerning episode was a failed APPEND operation (with a set of folders containing 200 messages):
IMAP(username): Corrupted transaction log file /var/mail/username/ Maildir/dovecot.index.log: unexpected end of file while reading header
That resulted in one of the new APPENDed mail folders and content being lost. We see those errors at other times too, although they don't always end in a noticeable failure.
Continuous other errors we see that might be NFS related:
IMAP(username): close() failed with index cache file /var/mail/ username/Maildir/.Deleted Messages/dovecot.index.cache: Stale NFS file handle IMAP(username): Corrupted index cache file /var/mail/username/ Maildir/.OldMail/dovecot.index.cache: invalid record size IMAP(username): close() failed with index file /var/mail/username/ Maildir/.INBOX.Security/dovecot.index: Bad file descriptor
We also see errors with POP3, which would rarely benefit from indexes (since 92% of our users RETR-and-DELE messages):
POP3(username): file_dotlock_create() failed with file /var/mail/ username/Maildir/dovecot.index.log: Stale NFS file handle POP3(username): Corrupted transaction log file /var/mail/username/ Maildir/dovecot.index.log: record size too small (type=0x8080, offset=9056, size=0) POP3(username): Our dotlock file /var/mail/username/Maildir/ dovecot.index.log.lock was deleted (kept it -118 secs) POP3(username): Timeout while waiting for release of dotlock for transaction log file /var/mail/username/Maildir/dovecot.index.log
We've tried manually removing "dovecot*" from all folders, allowing the indexes to be re-built, but the errors return.
Excerpt from our config:
mail_cache_fields mail_never_cache_fields mail_cache_min_mail_count
1000 mailbox_idle_check_interval = 10 mail_full_filesystem_access = no mail_read_mmaped = no mmap_disable = yes mmap_no_write = no lock_method = dotlock maildir_stat_dirs = yes maildir_copy_with_hardlinks = no
Our NFS mount:
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192 0 0
I've read some of the list archives about disabling the index files. We realize that indexes should be beneficial, but we'd like to at least experiment with a "no index" install. This would solve some of the NFS locking issues, eliminate the need to modify our MTA behavior, and remove (potentially) millions of files from the system.
Did anyone discover a method to cleanly disable the indexes? At least for POP3?
Thanks!
Rich Sandberg richs@whidbey.net
On Apr 28, 2006, at 6:59 AM, Luis Melendez wrote:
Hi all. We also had problems with client NFS locking using RHEL4, kernel 2.6.9 Some of them were solved updating the kelnel from RedHat. Hope this helps.
Regards.
Apps Lists wrote:
I'm running beta7 on two machines, with maildir on NFS. I have lockd running on all machines. I've found that Dovecot is highly unstable with NFS when accessing a mailbox on more than one machine at the same time.
Both dovecot machines have:
mmap_disable = yes lock_method = fcntl
NFS is version 3, exported from a third linux machine. All machines are running 2.6.9 kernel.
Any ideas what's going wrong here?
Steve
*** dev4 ***
Apr 28 09:17:40 dev4 dovecot: IMAP(jtest): Duplicate file in uidlist file /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: 1146230258.P2889Q0M407851.dev4.neonova.net:2,
Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension introduction for unknown id 0
Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): file mail-index.c: line 883 (mail_index_sync_from_transactions): assertion failed: (hdr.log_file_int_offset == (*map)->hdr.log_file_int_offset)
Apr 28 09:17:41 dev4 dovecot: child 2889 (imap) killed with signal 6
*** dev3 ***
Apr 28 09:17:39 dev3 dovecot: IMAP(jtest): Duplicate file in uidlist file /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: 1146230254.P2888Q0M651598.dev4.neonova.net
Apr 28 09:17:41 dev3 dovecot: IMAP(jtest): Fixed index file /var/mailstore/72/af/375887/Maildir/dovecot.index: first_recent_uid_lowwater 4036 -> 1772
Apr 28 09:17:44 dev3 dovecot: IMAP(jtest): /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was lowered (4036 -> 1774)
Apr 28 09:17:45 dev3 dovecot: IMAP(jtest): Duplicate file in uidlist file /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: 1146145480.P5976Q0M245663.dev3.neonova.net:2,
Apr 28 09:17:46 dev3 dovecot: IMAP(jtest): Fixed index file /var/mailstore/72/af/375887/Maildir/dovecot.index: first_recent_uid_lowwater 4038 -> 1774
Apr 28 09:17:47 dev3 dovecot: IMAP(jtest): /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was lowered (4038 -> 1775)
Apr 28 09:17:48 dev3 dovecot: IMAP(jtest): Duplicate file in uidlist file /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: 1146145480.P5976Q0M245663.dev3.neonova.net:2,
Apr 28 09:17:49 dev3 dovecot: IMAP(jtest): /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was lowered (4039 -> 1775)
Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension introduction for unknown id 0
Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): file mail-index.c: line 881 (mail_index_sync_from_transactions): assertion failed: (hdr.messages_count == (*map)->hdr.messages_count)
Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was lowered (4039 -> 1775)
Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append with UID 1775, but next_uid = 4039
Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append with UID 1775, but next_uid = 4039
Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Unexpected transaction log desync with index /var/mailstore/72/af/375887/Maildir/dovecot.index
Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Disconnected: Mailbox is in inconsistent state, please relogin.
Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Fixed index file /var/mailstore/72/af/375887/Maildir/dovecot.index: first_recent_uid_lowwater 4039 -> 1776
-- +---------------------------------------------- ^-----------------------+ | Luis Meléndez Aganzo ^ Email: luism@uco.es | | Servicio de Informática ^ Tlf: 34-(9) 57-211022 | | Analista - Área de Sistemas ^ Fax: 34-(9) 57-218116 | | Universidad de Córdoba (SPAIN) ^ http:// www.uco.es | +---------------------------------------------- ^-----------------------+
Adding "noac" to our /etc/fstab NFS mounts seems to have helped our
index errors so far:
nfsserver:/var/mail /var/mail nfs
rw,hard,intr,rsize=8192,wsize=8192,noac 0 0
Over the weekend we had 1,465,031 POP3 sessions across 3 servers, and
only logged 15 "Corrupted" errors. They were of these types:
Corrupted transaction log file /var/mail/(user)/Maildir/
dovecot.index.log: unexpected end of file while reading header
Corrupted transaction log file /var/mail/(user)/Maildir/
dovecot.index.log: end_offset (236) > current sync_offset (200)
Corrupted transaction log file /var/mail/(user)/Maildir/
dovecot.index.log: record size too small (type=0x8080, offset=7260,
size=0)
Corrupted transaction log file /var/mail/(user)/Maildir/
dovecot.index.log: hdr.size too large (0)
We also had 1,929 IMAP sessions from our testing support staff, and
only 2 errors:
Corrupted transaction log file /var/mail/(user)/Maildir/
dovecot.index.log: record size too small (type=0x5ef9, offset=200,
size=0)
Corrupted transaction log file /var/mail/(user)/Maildir/.Junk E-mail/
dovecot.index.log: end_offset (15292) > current sync_offset (15236)
Rich
On Apr 28, 2006, at 11:27 AM, Apps Lists wrote:
Rich,
We're also in the process of setting up a GFS/Cluster with
CentOS. I'm hopeful this will eliminate some of the issues.I'll keep everyone posted.
Steve
Hi Steve,
We'll give that a try. I'm sure users would prefer a slightly slower mail experience if it ensured accurate mail lists and reduced the chance for lost mail.
Rich Sandberg richs@whidbey.net Systems & Network Administrator Whidbey Telecom Network Operations
On Apr 28, 2006, at 9:37 AM, Apps Lists wrote:
Hi Rich.
I've seen a lot of the same IMAP errors... haven't played with POP yet.
You might want to try adding "noac" to the mount options... your line..
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192 0 0
.... would become:
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192,noac 0 0
.... and see if that helps. I haven't seen much if any difference.
What I don't understand is how some people are running what appears to be the same configuration ... RHEL/CentOS ... NFS3, 2+ Dovecot machines with IMAP and have no problems.
I'd love to get to the bottom of this.
Steve
We're using 2.6.9-34, the latest RHEL4 kernel, over NFSv3 with 3 Dovecot servers. Our most concerning episode was a failed APPEND operation (with a set of folders containing 200 messages):
IMAP(username): Corrupted transaction log file /var/mail/username/ Maildir/dovecot.index.log: unexpected end of file while reading header
That resulted in one of the new APPENDed mail folders and content being lost. We see those errors at other times too, although they don't always end in a noticeable failure.
Continuous other errors we see that might be NFS related:
IMAP(username): close() failed with index cache file /var/mail/ username/Maildir/.Deleted Messages/dovecot.index.cache: Stale NFS file handle IMAP(username): Corrupted index cache file /var/mail/username/ Maildir/.OldMail/dovecot.index.cache: invalid record size IMAP(username): close() failed with index file /var/mail/username/ Maildir/.INBOX.Security/dovecot.index: Bad file descriptor
We also see errors with POP3, which would rarely benefit from
indexes (since 92% of our users RETR-and-DELE messages):POP3(username): file_dotlock_create() failed with file /var/mail/ username/Maildir/dovecot.index.log: Stale NFS file handle POP3(username): Corrupted transaction log file /var/mail/username/ Maildir/dovecot.index.log: record size too small (type=0x8080, offset=9056, size=0) POP3(username): Our dotlock file /var/mail/username/Maildir/ dovecot.index.log.lock was deleted (kept it -118 secs) POP3(username): Timeout while waiting for release of dotlock for transaction log file /var/mail/username/Maildir/dovecot.index.log
We've tried manually removing "dovecot*" from all folders, allowing the indexes to be re-built, but the errors return.
Excerpt from our config:
mail_cache_fields mail_never_cache_fields mail_cache_min_mail_count
1000 mailbox_idle_check_interval = 10 mail_full_filesystem_access = no mail_read_mmaped = no mmap_disable = yes mmap_no_write = no lock_method = dotlock maildir_stat_dirs = yes maildir_copy_with_hardlinks = no
Our NFS mount:
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192 0 0
I've read some of the list archives about disabling the index
files. We realize that indexes should be beneficial, but we'd like to at least experiment with a "no index" install. This would solve
some of the NFS locking issues, eliminate the need to modify our MTA behavior, and remove (potentially) millions of files from the
system.Did anyone discover a method to cleanly disable the indexes? At
least for POP3?Thanks!
Rich Sandberg richs@whidbey.net
On Apr 28, 2006, at 6:59 AM, Luis Melendez wrote:
Hi all. We also had problems with client NFS locking using RHEL4, kernel 2.6.9 Some of them were solved updating the kelnel from RedHat. Hope this helps.
Regards.
Apps Lists wrote:
I'm running beta7 on two machines, with maildir on NFS. I have lockd running on all machines. I've found that Dovecot is highly unstable with NFS when accessing a mailbox on more than one machine at the same time.
Both dovecot machines have:
mmap_disable = yes lock_method = fcntl
NFS is version 3, exported from a third linux machine. All machines are running 2.6.9 kernel.
Any ideas what's going wrong here?
Steve
*** dev4 ***
Apr 28 09:17:40 dev4 dovecot: IMAP(jtest): Duplicate file in uidlist file /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: 1146230258.P2889Q0M407851.dev4.neonova.net:2,
Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension introduction for unknown id 0
Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): file mail-index.c:
line 883 (mail_index_sync_from_transactions): assertion failed: (hdr.log_file_int_offset == (*map)->hdr.log_file_int_offset)Apr 28 09:17:41 dev4 dovecot: child 2889 (imap) killed with signal 6
*** dev3 ***
Apr 28 09:17:39 dev3 dovecot: IMAP(jtest): Duplicate file in uidlist file /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: 1146230254.P2888Q0M651598.dev4.neonova.net
Apr 28 09:17:41 dev3 dovecot: IMAP(jtest): Fixed index file /var/mailstore/72/af/375887/Maildir/dovecot.index: first_recent_uid_lowwater 4036 -> 1772
Apr 28 09:17:44 dev3 dovecot: IMAP(jtest): /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was lowered (4036 -> 1774)
Apr 28 09:17:45 dev3 dovecot: IMAP(jtest): Duplicate file in uidlist file /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: 1146145480.P5976Q0M245663.dev3.neonova.net:2,
Apr 28 09:17:46 dev3 dovecot: IMAP(jtest): Fixed index file /var/mailstore/72/af/375887/Maildir/dovecot.index: first_recent_uid_lowwater 4038 -> 1774
Apr 28 09:17:47 dev3 dovecot: IMAP(jtest): /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was lowered (4038 -> 1775)
Apr 28 09:17:48 dev3 dovecot: IMAP(jtest): Duplicate file in uidlist file /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: 1146145480.P5976Q0M245663.dev3.neonova.net:2,
Apr 28 09:17:49 dev3 dovecot: IMAP(jtest): /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was lowered (4039 -> 1775)
Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension introduction for unknown id 0
Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): file mail-index.c:
line 881 (mail_index_sync_from_transactions): assertion failed: (hdr.messages_count == (*map)->hdr.messages_count)Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was lowered (4039 -> 1775)
Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append
with UID 1775, but next_uid = 4039Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append
with UID 1775, but next_uid = 4039Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Unexpected transaction log desync with index /var/mailstore/72/af/375887/Maildir/ dovecot.index
Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Disconnected: Mailbox is in inconsistent state, please relogin.
Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Fixed index file /var/mailstore/72/af/375887/Maildir/dovecot.index: first_recent_uid_lowwater 4039 -> 1776
-- +---------------------------------------------- ^-----------------------+ | Luis Meléndez Aganzo ^ Email: luism@uco.es | | Servicio de Informática ^ Tlf: 34-(9) 57-211022 | | Analista - Área de Sistemas ^ Fax: 34-(9) 57-218116 | | Universidad de Córdoba (SPAIN) ^ http:// www.uco.es | +---------------------------------------------- ^-----------------------+
Hi Rich.
Glad to see that helped out. I can still introduce errors with IMAP even with that setting.
We just finished setting up GFS and the results are the same:
May 1 11:33:11 dev4 dovecot: IMAP(jtest): fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory May 1 11:33:13 dev4 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append with UID 2703, but next_uid = 2704 May 1 11:33:15 dev4 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append with UID 2704, but next_uid = 2705 May 1 11:35:16 dev4 dovecot: IMAP(jtest): fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory May 1 11:35:16 dev4 dovecot: IMAP(jtest): file mail-index.c: line 865 (mail_index_sync_from_transactions): assertion failed: (prev_seq <= max_seq && (prev_seq != max_seq || prev_offset <= max_offset)) .... May 1 11:40:19 dev4 dovecot: IMAP(jtest): file_dotlock_open() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Resource temporarily unavailable May 1 11:40:19 dev4 dovecot: child 25918 (imap) killed with signal 11
I'm not entirely sure why I'm seeing file_dotlock_open() calls. We're using fcntl.
For the record: Three members of the cluster. Two machines are running Dovecot, mounting the filesystem with GFS. I have another random machine that connects to both machines with IMAP. The program connects, gets a list of folders and APPENDs a message. It then disconnects and repeats. I run one instance of this test against dovecot#1 and another against dovecot#2, and can get the above failure messages within seconds.
This is pretty much the same behaviour we've seen with NFS.
Steve
Adding "noac" to our /etc/fstab NFS mounts seems to have helped our index errors so far:
nfsserver:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192,noac 0 0
Over the weekend we had 1,465,031 POP3 sessions across 3 servers, and only logged 15 "Corrupted" errors. They were of these types:
Corrupted transaction log file /var/mail/(user)/Maildir/ dovecot.index.log: unexpected end of file while reading header Corrupted transaction log file /var/mail/(user)/Maildir/ dovecot.index.log: end_offset (236) > current sync_offset (200) Corrupted transaction log file /var/mail/(user)/Maildir/ dovecot.index.log: record size too small (type=0x8080, offset=7260, size=0) Corrupted transaction log file /var/mail/(user)/Maildir/ dovecot.index.log: hdr.size too large (0)
We also had 1,929 IMAP sessions from our testing support staff, and only 2 errors:
Corrupted transaction log file /var/mail/(user)/Maildir/ dovecot.index.log: record size too small (type=0x5ef9, offset=200, size=0) Corrupted transaction log file /var/mail/(user)/Maildir/.Junk E-mail/ dovecot.index.log: end_offset (15292) > current sync_offset (15236)
Rich
On Apr 28, 2006, at 11:27 AM, Apps Lists wrote:
Rich,
We're also in the process of setting up a GFS/Cluster with CentOS. I'm hopeful this will eliminate some of the issues.
I'll keep everyone posted.
Steve
Hi Steve,
We'll give that a try. I'm sure users would prefer a slightly slower mail experience if it ensured accurate mail lists and reduced the chance for lost mail.
Rich Sandberg richs@whidbey.net Systems & Network Administrator Whidbey Telecom Network Operations
On Apr 28, 2006, at 9:37 AM, Apps Lists wrote:
Hi Rich.
I've seen a lot of the same IMAP errors... haven't played with POP yet.
You might want to try adding "noac" to the mount options... your line..
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192 0 0
.... would become:
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192,noac 0 0
.... and see if that helps. I haven't seen much if any difference.
What I don't understand is how some people are running what appears to be the same configuration ... RHEL/CentOS ... NFS3, 2+ Dovecot machines with IMAP and have no problems.
I'd love to get to the bottom of this.
Steve
We're using 2.6.9-34, the latest RHEL4 kernel, over NFSv3 with 3 Dovecot servers. Our most concerning episode was a failed APPEND operation (with a set of folders containing 200 messages):
IMAP(username): Corrupted transaction log file /var/mail/username/ Maildir/dovecot.index.log: unexpected end of file while reading header
That resulted in one of the new APPENDed mail folders and content being lost. We see those errors at other times too, although they don't always end in a noticeable failure.
Continuous other errors we see that might be NFS related:
IMAP(username): close() failed with index cache file /var/mail/ username/Maildir/.Deleted Messages/dovecot.index.cache: Stale NFS file handle IMAP(username): Corrupted index cache file /var/mail/username/ Maildir/.OldMail/dovecot.index.cache: invalid record size IMAP(username): close() failed with index file /var/mail/username/ Maildir/.INBOX.Security/dovecot.index: Bad file descriptor
We also see errors with POP3, which would rarely benefit from indexes (since 92% of our users RETR-and-DELE messages):
POP3(username): file_dotlock_create() failed with file /var/mail/ username/Maildir/dovecot.index.log: Stale NFS file handle POP3(username): Corrupted transaction log file /var/mail/username/ Maildir/dovecot.index.log: record size too small (type=0x8080, offset=9056, size=0) POP3(username): Our dotlock file /var/mail/username/Maildir/ dovecot.index.log.lock was deleted (kept it -118 secs) POP3(username): Timeout while waiting for release of dotlock for transaction log file /var/mail/username/Maildir/dovecot.index.log
We've tried manually removing "dovecot*" from all folders, allowing the indexes to be re-built, but the errors return.
Excerpt from our config:
mail_cache_fields mail_never_cache_fields mail_cache_min_mail_count
> 1000 mailbox_idle_check_interval = 10 mail_full_filesystem_access = no mail_read_mmaped = no mmap_disable = yes mmap_no_write = no lock_method = dotlock maildir_stat_dirs = yes maildir_copy_with_hardlinks = no
Our NFS mount:
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192 0 0
I've read some of the list archives about disabling the index files. We realize that indexes should be beneficial, but we'd like to at least experiment with a "no index" install. This would solve some of the NFS locking issues, eliminate the need to modify our MTA behavior, and remove (potentially) millions of files from the system.
Did anyone discover a method to cleanly disable the indexes? At least for POP3?
Thanks!
Rich Sandberg richs@whidbey.net
On Apr 28, 2006, at 6:59 AM, Luis Melendez wrote:
Hi all. We also had problems with client NFS locking using RHEL4, kernel 2.6.9 Some of them were solved updating the kelnel from RedHat. Hope this helps.
Regards.
Apps Lists wrote:
> I'm running beta7 on two machines, with maildir on NFS. I have > lockd > running on all machines. I've found that Dovecot is highly > unstable with > NFS when accessing a mailbox on more than one machine at the same > time. > > Both dovecot machines have: > > mmap_disable = yes > lock_method = fcntl > > NFS is version 3, exported from a third linux machine. All > machines are > running 2.6.9 kernel. > > Any ideas what's going wrong here? > > Steve > > > *** dev4 *** > > Apr 28 09:17:40 dev4 dovecot: IMAP(jtest): Duplicate file in > uidlist file > /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: > 1146230258.P2889Q0M407851.dev4.neonova.net:2, > > Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): Corrupted transaction > log file > /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension > introduction for unknown id 0 > > Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): file mail-index.c: > line > 883 > (mail_index_sync_from_transactions): assertion failed: > (hdr.log_file_int_offset == (*map)->hdr.log_file_int_offset) > > > Apr 28 09:17:41 dev4 dovecot: child 2889 (imap) killed with > signal 6 > > > *** dev3 *** > > > > Apr 28 09:17:39 dev3 dovecot: IMAP(jtest): Duplicate file in > uidlist file > /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: > 1146230254.P2888Q0M651598.dev4.neonova.net > > Apr 28 09:17:41 dev3 dovecot: IMAP(jtest): Fixed index file > /var/mailstore/72/af/375887/Maildir/dovecot.index: > first_recent_uid_lowwater 4036 -> 1772 > > Apr 28 09:17:44 dev3 dovecot: IMAP(jtest): > /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was > lowered > (4036 -> 1774) > > Apr 28 09:17:45 dev3 dovecot: IMAP(jtest): Duplicate file in > uidlist file > /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: > 1146145480.P5976Q0M245663.dev3.neonova.net:2, > > Apr 28 09:17:46 dev3 dovecot: IMAP(jtest): Fixed index file > /var/mailstore/72/af/375887/Maildir/dovecot.index: > first_recent_uid_lowwater 4038 -> 1774 > > Apr 28 09:17:47 dev3 dovecot: IMAP(jtest): > /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was > lowered > (4038 -> 1775) > > Apr 28 09:17:48 dev3 dovecot: IMAP(jtest): Duplicate file in > uidlist file > /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: > 1146145480.P5976Q0M245663.dev3.neonova.net:2, > > Apr 28 09:17:49 dev3 dovecot: IMAP(jtest): > /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was > lowered > (4039 -> 1775) > > Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): Corrupted transaction > log file > /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension > introduction for unknown id 0 > > Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): file mail-index.c: > line > 881 > (mail_index_sync_from_transactions): assertion failed: > (hdr.messages_count > == (*map)->hdr.messages_count) > > Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): > /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid was > lowered > (4039 -> 1775) > > Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction > log file > /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append > with > UID > 1775, but next_uid = 4039 > > Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction > log file > /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append > with > UID > 1775, but next_uid = 4039 > > Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Unexpected > transaction log > desync with index /var/mailstore/72/af/375887/Maildir/ > dovecot.index > > Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Disconnected: > Mailbox is > in > inconsistent state, please relogin. > > Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Fixed index file > /var/mailstore/72/af/375887/Maildir/dovecot.index: > first_recent_uid_lowwater 4039 -> 1776 > > > > > > >
-- +---------------------------------------------- ^-----------------------+ | Luis Meléndez Aganzo ^ Email: luism@uco.es | | Servicio de Informática ^ Tlf: 34-(9) 57-211022 | | Analista - Área de Sistemas ^ Fax: 34-(9) 57-218116 | | Universidad de Córdoba (SPAIN) ^ http:// www.uco.es | +---------------------------------------------- ^-----------------------+
Hate to top-post... but... ah well.
I'm trying the latest CVS snapshot with GFS and am only seeing a couple of (hopefully) harmless errors...
fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory
fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index.cache: No such file or directory
Timo - Is it safe to ignore these fcntl() errors, or is there something else going on?
Rich - it might be a good idea to pull down the latest CVS and see if that resolves some or all of your NFS issues.
Steve
Hi Rich.
Glad to see that helped out. I can still introduce errors with IMAP even with that setting.
We just finished setting up GFS and the results are the same:
May 1 11:33:11 dev4 dovecot: IMAP(jtest): fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory May 1 11:33:13 dev4 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append with UID 2703, but next_uid = 2704 May 1 11:33:15 dev4 dovecot: IMAP(jtest): Corrupted transaction log file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append with UID 2704, but next_uid = 2705 May 1 11:35:16 dev4 dovecot: IMAP(jtest): fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory May 1 11:35:16 dev4 dovecot: IMAP(jtest): file mail-index.c: line 865 (mail_index_sync_from_transactions): assertion failed: (prev_seq <= max_seq && (prev_seq != max_seq || prev_offset <= max_offset)) .... May 1 11:40:19 dev4 dovecot: IMAP(jtest): file_dotlock_open() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Resource temporarily unavailable May 1 11:40:19 dev4 dovecot: child 25918 (imap) killed with signal 11
I'm not entirely sure why I'm seeing file_dotlock_open() calls. We're using fcntl.
For the record: Three members of the cluster. Two machines are running Dovecot, mounting the filesystem with GFS. I have another random machine that connects to both machines with IMAP. The program connects, gets a list of folders and APPENDs a message. It then disconnects and repeats. I run one instance of this test against dovecot#1 and another against dovecot#2, and can get the above failure messages within seconds.
This is pretty much the same behaviour we've seen with NFS.
Steve
Adding "noac" to our /etc/fstab NFS mounts seems to have helped our index errors so far:
nfsserver:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192,noac 0 0
Over the weekend we had 1,465,031 POP3 sessions across 3 servers, and only logged 15 "Corrupted" errors. They were of these types:
Corrupted transaction log file /var/mail/(user)/Maildir/ dovecot.index.log: unexpected end of file while reading header Corrupted transaction log file /var/mail/(user)/Maildir/ dovecot.index.log: end_offset (236) > current sync_offset (200) Corrupted transaction log file /var/mail/(user)/Maildir/ dovecot.index.log: record size too small (type=0x8080, offset=7260, size=0) Corrupted transaction log file /var/mail/(user)/Maildir/ dovecot.index.log: hdr.size too large (0)
We also had 1,929 IMAP sessions from our testing support staff, and only 2 errors:
Corrupted transaction log file /var/mail/(user)/Maildir/ dovecot.index.log: record size too small (type=0x5ef9, offset=200, size=0) Corrupted transaction log file /var/mail/(user)/Maildir/.Junk E-mail/ dovecot.index.log: end_offset (15292) > current sync_offset (15236)
Rich
On Apr 28, 2006, at 11:27 AM, Apps Lists wrote:
Rich,
We're also in the process of setting up a GFS/Cluster with CentOS. I'm hopeful this will eliminate some of the issues.
I'll keep everyone posted.
Steve
Hi Steve,
We'll give that a try. I'm sure users would prefer a slightly slower mail experience if it ensured accurate mail lists and reduced the chance for lost mail.
Rich Sandberg richs@whidbey.net Systems & Network Administrator Whidbey Telecom Network Operations
On Apr 28, 2006, at 9:37 AM, Apps Lists wrote:
Hi Rich.
I've seen a lot of the same IMAP errors... haven't played with POP yet.
You might want to try adding "noac" to the mount options... your line..
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192 0 0
.... would become:
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192,noac 0 0
.... and see if that helps. I haven't seen much if any difference.
What I don't understand is how some people are running what appears to be the same configuration ... RHEL/CentOS ... NFS3, 2+ Dovecot machines with IMAP and have no problems.
I'd love to get to the bottom of this.
Steve
We're using 2.6.9-34, the latest RHEL4 kernel, over NFSv3 with 3 Dovecot servers. Our most concerning episode was a failed APPEND operation (with a set of folders containing 200 messages):
IMAP(username): Corrupted transaction log file /var/mail/username/ Maildir/dovecot.index.log: unexpected end of file while reading header
That resulted in one of the new APPENDed mail folders and content being lost. We see those errors at other times too, although they don't always end in a noticeable failure.
Continuous other errors we see that might be NFS related:
IMAP(username): close() failed with index cache file /var/mail/ username/Maildir/.Deleted Messages/dovecot.index.cache: Stale NFS file handle IMAP(username): Corrupted index cache file /var/mail/username/ Maildir/.OldMail/dovecot.index.cache: invalid record size IMAP(username): close() failed with index file /var/mail/username/ Maildir/.INBOX.Security/dovecot.index: Bad file descriptor
We also see errors with POP3, which would rarely benefit from indexes (since 92% of our users RETR-and-DELE messages):
POP3(username): file_dotlock_create() failed with file /var/mail/ username/Maildir/dovecot.index.log: Stale NFS file handle POP3(username): Corrupted transaction log file /var/mail/username/ Maildir/dovecot.index.log: record size too small (type=0x8080, offset=9056, size=0) POP3(username): Our dotlock file /var/mail/username/Maildir/ dovecot.index.log.lock was deleted (kept it -118 secs) POP3(username): Timeout while waiting for release of dotlock for transaction log file /var/mail/username/Maildir/dovecot.index.log
We've tried manually removing "dovecot*" from all folders, allowing the indexes to be re-built, but the errors return.
Excerpt from our config:
mail_cache_fields mail_never_cache_fields mail_cache_min_mail_count >> 1000 mailbox_idle_check_interval = 10 >> mail_full_filesystem_access = no mail_read_mmaped = no mmap_disable = yes mmap_no_write = no lock_method = dotlock maildir_stat_dirs = yes maildir_copy_with_hardlinks = no
Our NFS mount:
nfshost:/var/mail /var/mail nfs rw,hard,intr,rsize=8192,wsize=8192 0 0
I've read some of the list archives about disabling the index files. We realize that indexes should be beneficial, but we'd like to at least experiment with a "no index" install. This would solve some of the NFS locking issues, eliminate the need to modify our MTA behavior, and remove (potentially) millions of files from the system.
Did anyone discover a method to cleanly disable the indexes? At least for POP3?
Thanks!
Rich Sandberg richs@whidbey.net
On Apr 28, 2006, at 6:59 AM, Luis Melendez wrote:
> Hi all. > We also had problems with client NFS locking using RHEL4, kernel > 2.6.9 > Some of them were solved updating the kelnel from RedHat. > Hope this helps. > > Regards. > > Apps Lists wrote: > >> I'm running beta7 on two machines, with maildir on NFS. I have >> lockd >> running on all machines. I've found that Dovecot is highly >> unstable with >> NFS when accessing a mailbox on more than one machine at the >> same time. >> >> Both dovecot machines have: >> >> mmap_disable = yes >> lock_method = fcntl >> >> NFS is version 3, exported from a third linux machine. All >> machines are >> running 2.6.9 kernel. >> >> Any ideas what's going wrong here? >> >> Steve >> >> >> *** dev4 *** >> >> Apr 28 09:17:40 dev4 dovecot: IMAP(jtest): Duplicate file in >> uidlist file >> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: >> 1146230258.P2889Q0M407851.dev4.neonova.net:2, >> >> Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): Corrupted transaction >> log file >> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension >> introduction for unknown id 0 >> >> Apr 28 09:17:41 dev4 dovecot: IMAP(jtest): file mail-index.c: >> line >> 883 >> (mail_index_sync_from_transactions): assertion failed: >> (hdr.log_file_int_offset == (*map)->hdr.log_file_int_offset) >> >> >> Apr 28 09:17:41 dev4 dovecot: child 2889 (imap) killed with >> signal 6 >> >> >> *** dev3 *** >> >> >> >> Apr 28 09:17:39 dev3 dovecot: IMAP(jtest): Duplicate file in >> uidlist file >> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: >> 1146230254.P2888Q0M651598.dev4.neonova.net >> >> Apr 28 09:17:41 dev3 dovecot: IMAP(jtest): Fixed index file >> /var/mailstore/72/af/375887/Maildir/dovecot.index: >> first_recent_uid_lowwater 4036 -> 1772 >> >> Apr 28 09:17:44 dev3 dovecot: IMAP(jtest): >> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid >> was lowered >> (4036 -> 1774) >> >> Apr 28 09:17:45 dev3 dovecot: IMAP(jtest): Duplicate file in >> uidlist file >> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: >> 1146145480.P5976Q0M245663.dev3.neonova.net:2, >> >> Apr 28 09:17:46 dev3 dovecot: IMAP(jtest): Fixed index file >> /var/mailstore/72/af/375887/Maildir/dovecot.index: >> first_recent_uid_lowwater 4038 -> 1774 >> >> Apr 28 09:17:47 dev3 dovecot: IMAP(jtest): >> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid >> was lowered >> (4038 -> 1775) >> >> Apr 28 09:17:48 dev3 dovecot: IMAP(jtest): Duplicate file in >> uidlist file >> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: >> 1146145480.P5976Q0M245663.dev3.neonova.net:2, >> >> Apr 28 09:17:49 dev3 dovecot: IMAP(jtest): >> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid >> was lowered >> (4039 -> 1775) >> >> Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): Corrupted transaction >> log file >> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Extension >> introduction for unknown id 0 >> >> Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): file mail-index.c: >> line >> 881 >> (mail_index_sync_from_transactions): assertion failed: >> (hdr.messages_count >> == (*map)->hdr.messages_count) >> >> Apr 28 09:17:50 dev3 dovecot: IMAP(jtest): >> /var/mailstore/72/af/375887/Maildir/dovecot-uidlist: next_uid >> was lowered >> (4039 -> 1775) >> >> Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction >> log file >> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append >> with >> UID >> 1775, but next_uid = 4039 >> >> Apr 28 09:17:51 dev3 dovecot: IMAP(jtest): Corrupted transaction >> log file >> /var/mailstore/72/af/375887/Maildir/dovecot.index.log: Append >> with >> UID >> 1775, but next_uid = 4039 >> >> Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Unexpected >> transaction log >> desync with index /var/mailstore/72/af/375887/Maildir/ >> dovecot.index >> >> Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Disconnected: >> Mailbox is >> in >> inconsistent state, please relogin. >> >> Apr 28 09:17:52 dev3 dovecot: IMAP(jtest): Fixed index file >> /var/mailstore/72/af/375887/Maildir/dovecot.index: >> first_recent_uid_lowwater 4039 -> 1776 >> >> >> >> >> >> >> > > > -- > +---------------------------------------------- > ^-----------------------+ > | Luis Meléndez Aganzo ^ Email: > luism@uco.es | > | Servicio de Informática ^ Tlf: 34-(9) > 57-211022 | > | Analista - Área de Sistemas ^ Fax: 34-(9) > 57-218116 | > | Universidad de Córdoba (SPAIN) ^ http:// > www.uco.es | > +---------------------------------------------- > ^-----------------------+
On Tue, 2006-05-02 at 11:06 -0400, Apps Lists wrote:
Hate to top-post... but... ah well.
I'm trying the latest CVS snapshot with GFS and am only seeing a couple of (hopefully) harmless errors...
fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory
fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index.cache: No such file or directory
Does it do this all the time or only sometimes? From those error messages it looks like fcntl() locks don't work with GFS, or maybe GFS sometimes breaks with them. I wouldn't say they're harmless until I knew why exactly it's giving those errors.
On Tue, 2006-05-02 at 11:06 -0400, Apps Lists wrote:
Hate to top-post... but... ah well.
I'm trying the latest CVS snapshot with GFS and am only seeing a couple of (hopefully) harmless errors...
fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory
fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index.cache: No such file or directory
Does it do this all the time or only sometimes? From those error messages it looks like fcntl() locks don't work with GFS, or maybe GFS sometimes breaks with them. I wouldn't say they're harmless until I knew why exactly it's giving those errors.
With concurrent operations on the same GFS filesystem across multiple machines, it happens all day long.
I've yet to see any evidence of corruption, so I'm not particularly concerned.
I'm not sure what changed between beta7 release and the last snapshot, but it's SO much happier... at least on GFS.
Steve (logs from one machine below)
May 2 11:52:25 dev4 dovecot: IMAP(jtest): fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory May 2 11:53:50 dev4 last message repeated 4 times May 2 11:55:42 dev4 last message repeated 2 times May 2 11:56:48 dev4 dovecot: IMAP(jtest): fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory May 2 11:58:46 dev4 last message repeated 3 times May 2 11:59:10 dev4 dovecot: IMAP(jtest): stat() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: No such file or directory May 2 11:59:10 dev4 dovecot: IMAP(jtest): fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: No such file or directory May 2 11:59:10 dev4 dovecot: IMAP(jtest): mail_index_wait_lock_fd() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index.log: No such file or directory May 2 11:59:54 dev4 dovecot: IMAP(jtest): fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory May 2 12:00:56 dev4 dovecot: IMAP(jtest): fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory May 2 12:02:29 dev4 last message repeated 2 times May 2 12:05:40 dev4 last message repeated 2 times May 2 12:07:05 dev4 dovecot: IMAP(jtest): fcntl() failed with file /var/mailstore/72/af/375887/Maildir/dovecot.index: No such file or directory
participants (3)
-
Apps Lists
-
richs@whidbey.net
-
Timo Sirainen