[Dovecot] Server 1.0.1 migration: Maildir : UID inserted in the middle of mailbox
Hello all,
Since a migration from dovecot 1.rc16 to Dovecot 1.0.1 + new server, every day I have a lot of errors like this :
Jun 21 17:42:34 dovecot1 deliver(damien.chambe@egs-gestion.fr): msgid=467A9B3E.5070605@egs-gestion.fr: saved mail to INBOX Jun 21 17:42:34 dovecot1 postfix/pipe[10242]: 92DD11FA8B: to=damien.chambe@egs-gestion.fr, relay=dovecot, delay=0, status=sent (egs-gestion.fr) Jun 21 17:42:34 dovecot1 postfix/qmgr[2724]: 92DD11FA8B: removed Jun 21 17:42:37 dovecot1 dovecot: IMAP(damien.chambe@egs-gestion.fr): Maildir /mnt/baie/hula/dovecot/maildir/damien.chambe sync: UID inserted in the middle of mailbox (374 > 1, file = 1182330228.P30307Q0M224874.dovecot1:2,ST) Jun 21 17:42:37 dovecot1 dovecot: IMAP(damien.chambe@egs-gestion.fr): Disconnected: Mailbox is in inconsistent state, please relogin. Jun 21 17:51:41 dovecot1 dovecot: imap-login: Login: user=damien.chambe@egs-gestion.fr, method=PLAIN, rip=10.6.1.104, lip=10.5.3.38 Jun 21 17:51:41 dovecot1 dovecot: IMAP(damien.chambe@egs-gestion.fr): Corrupted index cache file /opt/dovecot/indexes/damien.chambe/.INBOX/dovecot.index.cache: indexid changed
After this error, all UID are regenerated. And, when the user logs in, all headers are sent back to the client (IMAP / thunderbird 1.5.x or 2.0.0.4). No email are lost or duplicated, but it cause a huge traffic on our DSL lines, and user are complaining...
There's also other errors related to UID:
Jun 21 16:11:01 dovecot1 deliver(pr@egs-gestion.fr): rename(/mnt/baie/hula/dovecot/maildir/pr/dovecot-uidlist.lock, /mnt/baie/hula/dovecot/maildir/pr/dovecot-uidlist) failed: No such file or directory Jun 21 16:11:01 dovecot1 deliver(pr@egs-gestion.fr): file_dotlock_replace(/mnt/baie/hula/dovecot/maildir/pr/dovecot-uidlist) failed: No such file or directory Jun 21 16:11:02 dovecot1 deliver(pr@egs-gestion.fr): /mnt/baie/hula/dovecot/maildir/pr/dovecot-uidlist: next_uid was lowered (374 -> 373) Jun 21 16:11:02 dovecot1 deliver(pr@egs-gestion.fr): msgid=467A85CB.50009@fica.fr: saved mail to INBOX Jun 21 16:11:02 dovecot1 dovecot: IMAP(pr@egs-gestion.fr): rename(/mnt/baie/hula/dovecot/maildir/pr/dovecot-uidlist.lock, /mnt/baie/hula/dovecot/maildir/pr/dovecot-uidlist) failed: No such file or directory
Here's the context: Our Dovecot server had to be changed due to a hardware problem I was forced to use SUSE SLES 10 for the new one, instead of SLES 9 on the old server. X86-32 I've kept the same configuration files for postfix, dovecot. We use dovecot deliver. Old postfix version was 2.1.1, now I have 2.2.9
During the migration, I've changed dovecot 1.rc16 to dovecot 1.0.0 When I saw this error, I upgraded to dovecot 1.0.1, but same problem (I use suse rpm packages )
I have 400 mailbox, all Maildir. The problem happens 50 times a day, so not on every mail received. I store mails on NFS, and index on local disk. There's only one dovecot server, so no multiple access.
I've tried to remove all index files, and remove dovecot-uidlist, all is correctly re-created, but the problem remains.
Could it be an NFS locking problem ? Since only mail are stored on maildir + NFS, and index on local disk, so I didn't set lock_method or nmap_disable
Could it be related to this: see email from from Doug Concil :
[Dovecot] NFS lock contention for dovecot-uidlist
on may 17 , 2007 ?
dovecot -n
# 1.0.1: /etc/dovecot/dovecot.conf protocols: imap pop3 ssl_disable: yes disable_plaintext_auth: no login_dir: /var/run/dovecot/login login_executable(default): /usr/lib/dovecot/imap-login login_executable(imap): /usr/lib/dovecot/imap-login login_executable(pop3): /usr/lib/dovecot/pop3-login login_user: dovecotlogin login_process_per_connection: no login_processes_count: 5 max_mail_processes: 2048 first_valid_uid: 120 first_valid_gid: 12 default_mail_env: maildir:/mnt/baie/hula/dovecot/maildir/%n:INDEX=/opt/dovecot/indexes/%n mail_location: maildir:/mnt/baie/hula/dovecot/maildir/%n:INDEX=/opt/dovecot/indexes/%n mail_executable(default): /usr/lib/dovecot/imap mail_executable(imap): /usr/lib/dovecot/imap mail_executable(pop3): /usr/lib/dovecot/pop3 mail_plugins(default): quota imap_quota mail_plugins(imap): quota imap_quota mail_plugins(pop3): mail_plugin_dir(default): /usr/lib/dovecot/modules/imap mail_plugin_dir(imap): /usr/lib/dovecot/modules/imap mail_plugin_dir(pop3): /usr/lib/dovecot/modules/pop3 pop3_uidl_format(default): pop3_uidl_format(imap): pop3_uidl_format(pop3): %08Xu%08Xv auth default: verbose: yes passdb: driver: ldap args: /etc/dovecot/dovecot-ldap.conf userdb: driver: ldap args: /etc/dovecot/dovecot-ldap.conf socket: type: listen master: path: /var/run/dovecot/auth-master mode: 384 user: dovecot group: mail plugin: quota: maildir:storage=1500000:messages=40000
Thanks,
-- Cordialement,
Damien Chambe EGS 42 bld jules janin - BP 240 - 42006 Saint Etienne Cedex 1 Tel : 04 77 49 48 16 Fax : 04 77 49 48 45 Site institutionnel : http://www.groupe-laurent.com Site catalogue : http://ecat.groupe-laurent.com
On Fri, 2007-06-22 at 00:28 +0200, damien chambe - EGS wrote:
Our Dovecot server had to be changed due to a hardware problem I was forced to use SUSE SLES 10 for the new one, instead of SLES 9 on the old server.
The kernel matters a lot with NFS. Some kernels are more broken than others. Attribute cache also matters. http://wiki.dovecot.org/NFS
I store mails on NFS, and index on local disk. There's only one dovecot server, so no multiple access.
So deliver is also run on the same server? If all of it is done on the same server, then pretty much the only thing you can change is the kernel or somehow try to work around its bugs. I can think of only this fix on Dovecot's side:
http://dovecot.org/list/dovecot/2006-December/018145.html
Hmm. Actually I just realized another reason that could cause these: Are the clocks on the NFS server and on your Dovecot machine synchronized? They must be less than 1 second apart at all times or you'll begin to see problems.
Timo Sirainen a écrit :
On Fri, 2007-06-22 at 00:28 +0200, damien chambe - EGS wrote:
Our Dovecot server had to be changed due to a hardware problem I was forced to use SUSE SLES 10 for the new one, instead of SLES 9 on the old server.
The kernel matters a lot with NFS. Some kernels are more broken than others. Attribute cache also matters. http://wiki.dovecot.org/NFS
I store mails on NFS, and index on local disk. There's only one dovecot server, so no multiple access.
So deliver is also run on the same server? If all of it is done on the same server, then pretty much the only thing you can change is the kernel or somehow try to work around its bugs. I can think of only this fix on Dovecot's side:
http://dovecot.org/list/dovecot/2006-December/018145.html
Hmm. Actually I just realized another reason that could cause these: Are the clocks on the NFS server and on your Dovecot machine synchronized? They must be less than 1 second apart at all times or you'll begin to see problems.
I've tried to update kernel (SUSE SLES 10 is 2.6.16) but no change. But synchronizing NFS server and dovecot server with NTP did the trick.
SUSE SLES9 NFS was more tolerant than SLES 10 with time sync...
No more uid messages yesterday !
Thank you for your quick answer
-- Cordialement,
Damien Chambe EGS - Groupe Laurent 42 bld jules janin - BP 240 - 42006 Saint Etienne Cedex 1 Tel : 04 77 49 48 16 Fax : 04 77 49 48 45 Site institutionnel : http://www.groupe-laurent.com Site catalogue : http://ecat.groupe-laurent.com
On 6/27/07, damien chambe - EGS damien.chambe@groupe-laurent.com wrote:
Timo Sirainen a écrit :
On Fri, 2007-06-22 at 00:28 +0200, damien chambe - EGS wrote:
Our Dovecot server had to be changed due to a hardware problem I was forced to use SUSE SLES 10 for the new one, instead of SLES 9 on the old server.
The kernel matters a lot with NFS. Some kernels are more broken than others. Attribute cache also matters. http://wiki.dovecot.org/NFS
I store mails on NFS, and index on local disk. There's only one dovecot server, so no multiple access.
So deliver is also run on the same server? If all of it is done on the same server, then pretty much the only thing you can change is the kernel or somehow try to work around its bugs. I can think of only this fix on Dovecot's side:
http://dovecot.org/list/dovecot/2006-December/018145.html
Hmm. Actually I just realized another reason that could cause these: Are the clocks on the NFS server and on your Dovecot machine synchronized? They must be less than 1 second apart at all times or you'll begin to see problems.
I've tried to update kernel (SUSE SLES 10 is 2.6.16) but no change. But synchronizing NFS server and dovecot server with NTP did the trick.
SUSE SLES9 NFS was more tolerant than SLES 10 with time sync...
No more uid messages yesterday !
did you use the following on the NFS server : option "no_subtree_check" in /etc/exports I know that it solved some bugs when accessing files through mmap() on a NFS filesystem.
-- DINH Viêt Hoà
DINH Viêt Hoà a écrit :
On 6/27/07, damien chambe - EGS damien.chambe@groupe-laurent.com wrote:
On Fri, 2007-06-22 at 00:28 +0200, damien chambe - EGS wrote:
Our Dovecot server had to be changed due to a hardware problem I was forced to use SUSE SLES 10 for the new one, instead of SLES 9 on the old server.
The kernel matters a lot with NFS. Some kernels are more broken than others. Attribute cache also matters. http://wiki.dovecot.org/NFS
I store mails on NFS, and index on local disk. There's only one dovecot server, so no multiple access.
So deliver is also run on the same server? If all of it is done on the same server, then pretty much the only thing you can change is the kernel or somehow try to work around its bugs. I can think of only
fix on Dovecot's side:
http://dovecot.org/list/dovecot/2006-December/018145.html
Hmm. Actually I just realized another reason that could cause
Timo Sirainen a écrit : this these: Are
the clocks on the NFS server and on your Dovecot machine synchronized? They must be less than 1 second apart at all times or you'll begin to see problems.
I've tried to update kernel (SUSE SLES 10 is 2.6.16) but no change. But synchronizing NFS server and dovecot server with NTP did the trick.
SUSE SLES9 NFS was more tolerant than SLES 10 with time sync...
No more uid messages yesterday !
did you use the following on the NFS server : option "no_subtree_check" in /etc/exports I know that it solved some bugs when accessing files through mmap() on a NFS filesystem.
I can't easily modify export option on the NFS server, it's a Lifekeeper DRDB cluster, I've used noac for the mount on the dovecot side. It slows downs a little but it is very reliable on my config.
Damien
participants (3)
-
damien chambe - EGS
-
DINH Viêt Hoà
-
Timo Sirainen