[Dovecot] Index corruption causes child process to die
My recent migration from Courier-IMAP to Dovecot has been marred by these errors:
Oct 24 12:12:40 server dovecot: IMAP(juha): Fixed index file ~user/Maildir/.Geekzone/dovecot.index: first_recent_uid_lowwater 431 -> 430 Oct 24 12:12:40 server dovecot: IMAP(juha): Corrupted index file ~user/Maildir/.Geekzone/dovecot.index: Duplicate header extension keywords Oct 24 12:12:40 server dovecot: child 5786 (imap) killed with signal 11
They happen with Thunderbird 1.5.0.7 and I'm using Dovecot 1.0RC7 from the FreeBSD ports.
I think the index corruption may be caused by Tbird's message filtering - I'm asking it to filter certain messages from INBOX to subfolders.
When the error occurs, Tbird pops up an error message saying it can't connect to the mail server, but I don't lose any messages it seems. Instead, I get duplicates and even triplicates when I restart Tbird.
Any work around for this?
-- Juha http://www.geekzone.co.nz/juha
On 10/24/06, Juha Saarinen <juhasaarinen@gmail.com> wrote:
My recent migration from Courier-IMAP to Dovecot has been marred by these errors:
Oct 24 12:12:40 server dovecot: IMAP(juha): Fixed index file ~user/Maildir/.Geekzone/dovecot.index: first_recent_uid_lowwater 431 -> 430 Oct 24 12:12:40 server dovecot: IMAP(juha): Corrupted index file ~user/Maildir/.Geekzone/dovecot.index: Duplicate header extension keywords Oct 24 12:12:40 server dovecot: child 5786 (imap) killed with signal 11
They happen with Thunderbird 1.5.0.7 and I'm using Dovecot 1.0RC7 from the FreeBSD ports.
I think the index corruption may be caused by Tbird's message filtering - I'm asking it to filter certain messages from INBOX to subfolders.
When the error occurs, Tbird pops up an error message saying it can't connect to the mail server, but I don't lose any messages it seems. Instead, I get duplicates and even triplicates when I restart Tbird.
Any work around for this?
This took care of the index corruption (touch wood):
lock_method=dotlock mmap_disable=yes
Before, I was using the defaults:
lock_method=fcntl mmap_disable=no
At this stage, I'm not sure if it was the change of index locking method or disabling mmapping that fixed things.
-- Juha http://www.geekzone.co.nz/juha
On 25.10.2006, at 1.38, Juha Saarinen wrote:
On 10/24/06, Juha Saarinen <juhasaarinen@gmail.com> wrote:
My recent migration from Courier-IMAP to Dovecot has been marred by these errors:
Oct 24 12:12:40 server dovecot: IMAP(juha): Fixed index file ~user/Maildir/.Geekzone/dovecot.index: first_recent_uid_lowwater 431 -> 430 ..
This took care of the index corruption (touch wood):
lock_method=dotlock mmap_disable=yes
Do you use NFS? If so, then at least mmap_disable is needed. If not,
then it shouldn't really make a difference..
lock_method=fcntl should work with NFS if you're using lockd, and
with non-NFS it should be working always..
On 10/25/06, Timo Sirainen <tss@iki.fi> wrote:
Do you use NFS? If so, then at least mmap_disable is needed. If not, then it shouldn't really make a difference..
lock_method=fcntl should work with NFS if you're using lockd, and with non-NFS it should be working always..
No, this is on a local disk (SATA). What is the difference between the three different locking methods?
-- Juha http://www.geekzone.co.nz/juha
Juha Saarinen wrote:
On 10/25/06, Timo Sirainen <tss@iki.fi> wrote:
Do you use NFS? If so, then at least mmap_disable is needed. If not, then it shouldn't really make a difference..
lock_method=fcntl should work with NFS if you're using lockd, and with non-NFS it should be working always..
No, this is on a local disk (SATA). What is the difference between the three different locking methods?
(I'm also seeing this problem with FreeBSD 6.1 and Dovecot 1.0rc10 on a local file system (ufs2 on SATA). I haven't tried the suggested workarounds yet, I'll post the results when I'm done.)
-- Daniel Mikkelsen Copyleft Software AS
On 10/26/06, Hr. Daniel Mikkelsen <daniel@copyleft.no> wrote:
(I'm also seeing this problem with FreeBSD 6.1 and Dovecot 1.0rc10 on a local file system (ufs2 on SATA). I haven't tried the suggested workarounds yet, I'll post the results when I'm done.)
The workarounds could be red herrings... I've enabled mmap again, with fcntl as the locking method and haven't had the corrupted index and child processes dying again.
-- Juha http://www.geekzone.co.nz/juha
On 10/26/06, Juha Saarinen <juhasaarinen@gmail.com> wrote:
The workarounds could be red herrings... I've enabled mmap again, with fcntl as the locking method and haven't had the corrupted index and child processes dying again.
Hmm, crap, it's happening again.
pid 94357 (imap), uid 1001: exited on signal 11
Oct 26 15:55:35 server dovecot: IMAP(user): Fixed index file //home/user Maildir/.journz/dovecot.index: first_recent_uid_lowwater 2629 -> 2628
Oct 26 15:55:35 server dovecot: IMAP(user): Corrupted index file //home/user/Maildir/.journz/dovecot.index: Duplicate header extension keywords Oct 26 15:55:35 server dovecot: child 94357 (imap) killed with signal 11
Let's see if flock does the same.
-- Juha http://www.geekzone.co.nz/juha
Juha Saarinen wrote:
Hmm, crap, it's happening again.
pid 94357 (imap), uid 1001: exited on signal 11
Oct 26 15:55:35 server dovecot: IMAP(user): Fixed index file //home/user Maildir/.journz/dovecot.index: first_recent_uid_lowwater 2629 -> 2628
Oct 26 15:55:35 server dovecot: IMAP(user): Corrupted index file //home/user/Maildir/.journz/dovecot.index: Duplicate header extension keywords Oct 26 15:55:35 server dovecot: child 94357 (imap) killed with signal 11
Let's see if flock does the same.
I wasn't able to draw any conclusions either way - I only get this problem on two heavily loaded servers, and the two altered options made them run much slower so I had to revert.
-- Daniel Mikkelsen Copyleft Software AS
On 10/27/06, Hr. Daniel Mikkelsen <daniel@copyleft.no> wrote:
I wasn't able to draw any conclusions either way - I only get this problem on two heavily loaded servers, and the two altered options made them run much slower so I had to revert.
Well, trial and error here points to mmap - if I disable that, I can use fcntl (or flock). Enabling it, and I get the index corruption.
Not sure how to ktrace this.
-- Juha http://www.geekzone.co.nz/juha
On Fri, 2006-10-27 at 09:52 +1300, Juha Saarinen wrote:
On 10/27/06, Hr. Daniel Mikkelsen <daniel@copyleft.no> wrote:
I wasn't able to draw any conclusions either way - I only get this problem on two heavily loaded servers, and the two altered options made them run much slower so I had to revert.
Well, trial and error here points to mmap - if I disable that, I can use fcntl (or flock). Enabling it, and I get the index corruption.
mmap_disable=yes does quite a many things differently than mmap_disable=no, so I guess it could just be a bug somewhere in the =no codepath.
Daniel, how about you? Does mmap_disable=yes fix it for you? And what error messages exactly do you get?
Timo Sirainen wrote:
mmap_disable=yes does quite a many things differently than mmap_disable=no, so I guess it could just be a bug somewhere in the =no codepath.
Daniel, how about you? Does mmap_disable=yes fix it for you? And what error messages exactly do you get?
Hi.
It's hard for me to tell, because the problem is intermittent, and I can't make it appear on my test rig / development platform.
And when set mmap_disable=yes the load goes through the roof on the production servers, and they quickly become unresponsive, so I have to set it back to mmap_disable=yes.
But I would have to say that it _appears_ the particular errors are not present when running with mmap_disable=yes. I'm trying to test it again now to verify.
The exact error I've been getting is:
Nov 7 17:10:53 famine dovecot: IMAP(account-name): Corrupted index file /var/shemail/maildirs/site/user/.folder/dovecot.index: Duplicate header extension keywords
This only happens for Thunderbird IMAP users as far as I can tell. The client then loses connection with the server after a timeout that can last for a minute or two.
After reconnecting, duplicates of some messages will sometimes appear, and some times new messages will be sorted at the top of Thunderbird's list.
In the maildir, these duplicate messages are hardlinks of the same file.
I'm running FreeBSD 6.1 with Dovecot 1.0rc12.
-- Daniel Mikkelsen Copyleft Software AS
On Tue, 2006-11-07 at 17:45 +0100, Hr. Daniel Mikkelsen wrote:
Timo Sirainen wrote:
mmap_disable=yes does quite a many things differently than mmap_disable=no, so I guess it could just be a bug somewhere in the =no codepath.
Daniel, how about you? Does mmap_disable=yes fix it for you? And what error messages exactly do you get?
Hi.
It's hard for me to tell, because the problem is intermittent, and I can't make it appear on my test rig / development platform.
And when set mmap_disable=yes the load goes through the roof on the production servers, and they quickly become unresponsive, so I have to set it back to mmap_disable=yes.
"back to mmap_disable=no" I guess? They shouldn't affect load that much, mmap_disable should just use somewhat more memory and CPU.
Nov 7 17:10:53 famine dovecot: IMAP(account-name): Corrupted index file /var/shemail/maildirs/site/user/.folder/dovecot.index: Duplicate header extension keywords
This only happens for Thunderbird IMAP users as far as I can tell. The client then loses connection with the server after a timeout that can last for a minute or two.
That error message should only cause index file to be rebuilt. I'm not sure if it gives an error message or if it kills the connection immediately, but it really shouldn't just hang the connection.
After reconnecting, duplicates of some messages will sometimes appear, and some times new messages will be sorted at the top of Thunderbird's list.
Also that error message doesn't have anything to do with creation of duplicate messages.
In the maildir, these duplicate messages are hardlinks of the same file.
Do you have maildir_copy_with_hardlinks=yes? If yes, it sounds like the client is copying messages into the mailbox itself (or possibly copy to Trash and back to inbox). If it's set to no, then I don't think it's Dovecot creating those files at all.
Timo Sirainen wrote:
On Tue, 2006-11-07 at 17:45 +0100, Hr. Daniel Mikkelsen wrote:
And when set mmap_disable=yes the load goes through the roof on the production servers, and they quickly become unresponsive, so I have to set it back to mmap_disable=yes.
"back to mmap_disable=no" I guess? They shouldn't affect load that much, mmap_disable should just use somewhat more memory and CPU.
(Yes to the guess.) Perhaps it was the setting lock_method=dotlock that I tested at the same time that affected load.
After reconnecting, duplicates of some messages will sometimes appear, and some times new messages will be sorted at the top of Thunderbird's list.
Also that error message doesn't have anything to do with creation of duplicate messages.
On closer inspeciton, the duplication of messages appears is associated with an extra error:
Nov 7 14:10:40 famine dovecot: IMAP(account-name): Fixed index file /var/shemail/maildirs/site/user/.folder/dovecot.index: first_recent_uid_lowwater 814 -> 813 Nov 7 14:10:40 famine dovecot: IMAP(account-name): Corrupted index file /var/shemail/maildirs/site/user/.folder/dovecot.index: Duplicate header extension keywords Nov 7 14:10:40 famine dovecot: child 5088 (imap) killed with signal 11
Where I find these three errors, I'm also able to find a set of hardlinked mail files with corresponding timestamp.
In the maildir, these duplicate messages are hardlinks of the same file.
Do you have maildir_copy_with_hardlinks=yes? If yes, it sounds like the client is copying messages into the mailbox itself (or possibly copy to Trash and back to inbox). If it's set to no, then I don't think it's Dovecot creating those files at all.
It's set to yes.
-- Daniel Mikkelsen Copyleft Software AS
On 11/8/06, Hr. Daniel Mikkelsen <daniel@copyleft.no> wrote:
Do you have maildir_copy_with_hardlinks=yes? If yes, it sounds like the client is copying messages into the mailbox itself (or possibly copy to Trash and back to inbox). If it's set to no, then I don't think it's Dovecot creating those files at all.
It's set to yes.
I have maildir_copy_with_hardlinks=yes as well. Would it better to turn that off?
-- Juha http://www.geekzone.co.nz/juha
On Wed, 2006-11-08 at 08:13 +1300, Juha Saarinen wrote:
On 11/8/06, Hr. Daniel Mikkelsen <daniel@copyleft.no> wrote:
Do you have maildir_copy_with_hardlinks=yes? If yes, it sounds like the client is copying messages into the mailbox itself (or possibly copy to Trash and back to inbox). If it's set to no, then I don't think it's Dovecot creating those files at all.
It's set to yes.
I have maildir_copy_with_hardlinks=yes as well. Would it better to turn that off?
No. I was just wondering what would cause the copies to be made.
I tried reproducing this "Duplicate header" error myself, but I couldn't. I also can't see anything in the code that would even allow the duplication to be created.
I did however notice that there's one race condition between expunging mails and saving new mails. I'm not sure if that helps for this problem, but it might. Fixed in latest nightly snapshot (in a couple of minutes, still building) and patch here:
http://dovecot.org/list/dovecot-cvs/2006-November/006759.html
If that doesn't help, I've also attached a debug patch. Could you try running with it and see if it now crashes with signal 6? If it does, could you send me a gdb backtrace then?
Also if it crashes with signal 11 somewhere I'd like a backtrace of that too. See "Debugging crashes" in http://dovecot.org/bugreport.html
Timo Sirainen said the following On 7/11/2006 22:39:
Oh, right, the patch.
I applied this morning and until now seems to work
Ciao, luigi
-- / +--[Luigi Rosa]-- \
I know why you're here, Neo. I know what you've been doing... why you hardly sleep, why you live alone, and why night after night, you sit by your computer. You're looking for him. I know because I was once looking for the same thing. And when he found me, he told me I wasn't really looking for him. I was looking for an answer. It's the question, Neo. It's the question that drives us. It's the question that brought you here. You know the question, just as I did. --Trinity, The Matrix
Timo Sirainen said the following On 7/11/2006 22:39:
Oh, right, the patch.
Sigh! ;(((
Nov 8 16:20:41 mail dovecot: IMAP(lrosa): Fixed index file /var/spool/mail2/lrosa/.HT list/dovecot.index: first_recent_uid_lowwater 4407 -> 4406 Nov 8 16:20:41 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.HT list/dovecot.index: Duplicate header extension keywords Nov 8 16:21:43 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.HT list/dovecot.index: Duplicate header extension keywords
This events has been loggd few moments ago when "too many open files" message arrived and Thunderbird moved it to Dovecot mailbox.
I think that is some kind of race condition that occurs only when there are "fast" servers. In am not talking about supercomputers, but situiations of realtively small amount of workload. This situation coul be worsened by a direct LAN connection between client and server and a frequent (mine is 1 minute) check by mail client. Could it be possible?
Please note that your patch was useful because this is the first error I get in 12 hours, previously I got this kind of error more often.
Ciao, luigi
-- / +--[Luigi Rosa]-- \
The universe is run by the complex interweaving of three elements: energy, matter, and enlightened self-interest. --G'Kar "Survivors", Babylon 5
Timo Sirainen said the following On 7/11/2006 22:39:
Oh, right, the patch.
It happened again ad was more serious.
{two mail arrived from two different mailing lists, Thunderbirds moves them in their mailbox but...}
Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Fixed index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: first_recent_uid_lowwater 10272 -> 10271 Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords Nov 8 20:36:44 mail dovecot: child 12176 (imap) killed with signal 11
{I tried to open the mailbox with the new mail, but Thunderbird was waiting something from IMAP server, but the server was not responding as TB expected. I forced close TB}
Nov 8 20:36:58 mail dovecot: imap-login: Login: user=<lrosa>, method=plain, rip=62.123.164.114, lip=62.123.164.113 Nov 8 20:37:25 mail dovecot: IMAP(lrosa): Disconnected in IDLE Nov 8 20:37:25 mail dovecot: IMAP(lrosa): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) Nov 8 20:37:25 mail dovecot: child 15393 (imap) killed with signal 6 Nov 8 20:37:25 mail dovecot: IMAP(polo): Disconnected Nov 8 20:37:25 mail dovecot: IMAP(polo): Disconnected Nov 8 20:37:25 mail dovecot: IMAP(polo): Disconnected in IDLE Nov 8 20:37:25 mail dovecot: IMAP(polo): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) Nov 8 20:37:25 mail dovecot: child 12243 (imap) killed with signal 6 Nov 8 20:37:31 mail dovecot: IMAP(lrosa): Disconnected in IDLE Nov 8 20:37:31 mail dovecot: IMAP(lrosa): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) Nov 8 20:37:31 mail dovecot: child 12178 (imap) killed with signal 6 Nov 8 20:37:31 mail dovecot: IMAP(lrosa): Disconnected in IDLE Nov 8 20:37:31 mail dovecot: IMAP(lrosa): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) Nov 8 20:37:31 mail dovecot: IMAP(virusalert): Disconnected in IDLE Nov 8 20:37:31 mail dovecot: IMAP(virusalert): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) Nov 8 20:37:31 mail dovecot: IMAP(lrosa): Disconnected in IDLE Nov 8 20:37:31 mail dovecot: IMAP(lrosa): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) Nov 8 20:37:31 mail dovecot: child 12164 (imap) killed with signal 6 Nov 8 20:37:31 mail dovecot: child 12245 (imap) killed with signal 6 Nov 8 20:37:31 mail dovecot: child 15010 (imap) killed with signal 6
{I testarted Thunderbird, please note that the same thunderbird handles 3 different accounts on the same server, but only 'lrosa' receives mail, the other can be considered unused}
Nov 8 20:37:52 mail dovecot: imap-login: Login: user=<lrosa>, method=plain, rip=62.123.164.114, lip=62.123.164.113 Nov 8 20:37:53 mail last message repeated 2 times
{same behaviour as before, I cannot open the mailbox with the new mail, I have to force close TB again}
Nov 8 20:37:54 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords Nov 8 20:37:54 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords Nov 8 20:38:04 mail dovecot: imap-login: Login: user=<lrosa>, method=plain, rip=62.123.164.114, lip=62.123.164.113 Nov 8 20:38:09 mail dovecot: imap-login: Login: user=<lrosa>, method=plain, rip=62.123.164.114, lip=62.123.164.113 Nov 8 20:38:25 mail dovecot: IMAP(lrosa): Disconnected in IDLE Nov 8 20:38:25 mail dovecot: IMAP(lrosa): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) Nov 8 20:38:25 mail dovecot: child 15421 (imap) killed with signal 6 Nov 8 20:38:25 mail dovecot: IMAP(lrosa): Disconnected in IDLE Nov 8 20:38:25 mail dovecot: IMAP(lrosa): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) Nov 8 20:38:25 mail dovecot: child 15424 (imap) killed with signal 6 Nov 8 20:38:25 mail dovecot: IMAP(lrosa): Disconnected in IDLE Nov 8 20:38:25 mail dovecot: IMAP(lrosa): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) Nov 8 20:38:25 mail dovecot: child 15415 (imap) killed with signal 6
{this time I decide to stop and restart Dovecot too}
Nov 8 20:38:41 mail dovecot: Dovecot v1.0.rc13 starting up
{everything is back to normal, with the exception that the two mail boxes with new messages has three copies of the new message: the original and one for each of the two incidents above}
Ciao, luigi
-- / +--[Luigi Rosa]-- \
Organic chemistry is the chemistry of carbon compounds. Biochemistry is the study of carbon compounds that crawl. --Mike Adams
On Wed, 2006-11-08 at 20:46 +0100, Luigi Rosa wrote:
Timo Sirainen said the following On 7/11/2006 22:39:
Oh, right, the patch.
It happened again ad was more serious.
{two mail arrived from two different mailing lists, Thunderbirds moves them in their mailbox but...}
Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Fixed index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: first_recent_uid_lowwater 10272 -> 10271 Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords
Were you running with my duplicate header patch? (not included in rc13)
Nov 8 20:36:44 mail dovecot: child 12176 (imap) killed with signal 11
gdb backtrace from this would be nice.
Nov 8 20:37:25 mail dovecot: IMAP(lrosa): Disconnected in IDLE Nov 8 20:37:25 mail dovecot: IMAP(lrosa): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0)
gdb backtrace from this would be _really_ nice. See http://dovecot.org/bugreport.html
My success message tis morning after applying the patch was a bit early. I did find that "Duplicate Header" error im my logs later this afternoon and also have a " Fixed index file /home/rainer/Maildir/.software.Dovecot/dovecot.index: first_recent_uid_lowwater 2038 -> 2037" running an unpatched RC13 again.
I'll try to get a dump.
Timo Sirainen wrote:
On Wed, 2006-11-08 at 20:46 +0100, Luigi Rosa wrote:
Timo Sirainen said the following On 7/11/2006 22:39:
Oh, right, the patch. It happened again ad was more serious.
{two mail arrived from two different mailing lists, Thunderbirds moves them in their mailbox but...}
Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Fixed index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: first_recent_uid_lowwater 10272 -> 10271 Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords
Were you running with my duplicate header patch? (not included in rc13)
Nov 8 20:36:44 mail dovecot: child 12176 (imap) killed with signal 11
gdb backtrace from this would be nice.
Nov 8 20:37:25 mail dovecot: IMAP(lrosa): Disconnected in IDLE Nov 8 20:37:25 mail dovecot: IMAP(lrosa): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0)
gdb backtrace from this would be _really_ nice. See http://dovecot.org/bugreport.html
-- in diesem Sinne, Rainer
I am seeing the following errors with vanilla RC13 (two servers, nfs, mmap_disable=yes, lock_method=fnctl, Maildir). These only manifested after the upgrade to 1.0RC13
server1: dovecot: Nov 09 15:44:08 Error: IMAP(user1): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 15:44:08 Error: child 2920 (imap) killed with signal 6 dovecot: Nov 09 16:09:47 Error: IMAP(user2): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 16:09:47 Error: child 1894 (imap) killed with signal 6 dovecot: Nov 09 16:09:49 Error: IMAP(user2): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 16:09:49 Error: child 1887 (imap) killed with signal 6 dovecot: Nov 09 16:49:15 Error: IMAP(user3): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 16:49:15 Error: child 495 (imap) killed with signal 6 dovecot: Nov 09 17:51:36 Error: IMAP(user4): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 17:51:36 Error: child 7154 (imap) killed with signal 6 dovecot: Nov 09 19:09:21 Error: IMAP(user5): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 19:09:21 Error: child 6289 (imap) killed with signal 6 dovecot: Nov 09 22:56:31 Error: IMAP(user6): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 22:56:31 Error: child 13968 (imap) killed with signal 6 dovecot: Nov 10 11:17:41 Error: IMAP(user7): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 10 11:17:41 Error: child 25516 (imap) killed with signal 6 dovecot: Nov 10 12:34:10 Error: IMAP(user4): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 10 12:34:10 Error: child 24433 (imap) killed with signal 6 dovecot: Nov 10 12:34:26 Error: IMAP(user4): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 10 12:34:26 Error: child 25041 (imap) killed with signal 6 dovecot: Nov 10 13:06:36 Error: IMAP(user8): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 10 13:06:36 Error: child 29278 (imap) killed with signal 6
server2: dovecot: Nov 09 14:42:39 Error: IMAP(user9): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 14:42:39 Error: child 31506 (imap) killed with signal 6 dovecot: Nov 09 14:47:36 Error: IMAP(user10): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 14:47:36 Error: child 30492 (imap) killed with signal 6 dovecot: Nov 09 16:01:33 Error: IMAP(user8): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 16:01:33 Error: child 31884 (imap) killed with signal 6 dovecot: Nov 09 17:51:36 Error: IMAP(user4): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 17:51:36 Error: child 5648 (imap) killed with signal 6 dovecot: Nov 09 20:08:26 Error: IMAP(user1): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 20:08:26 Error: child 5344 (imap) killed with signal 6 dovecot: Nov 10 10:11:12 Error: IMAP(user1): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 10 10:11:12 Error: child 25314 (imap) killed with signal 6 dovecot: Nov 10 10:17:35 Error: IMAP(user10): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 10 10:17:35 Error: child 25997 (imap) killed with signal 6 dovecot: Nov 10 10:35:19 Error: IMAP(user8): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 10 10:35:19 Error: child 25964 (imap) killed with signal 6 dovecot: Nov 10 11:14:31 Error: IMAP(user11): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 10 11:14:31 Error: child 24501 (imap) killed with signal 6 dovecot: Nov 10 11:41:49 Error: IMAP(user10): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 10 11:41:49 Error: child 28684 (imap) killed with signal 6 dovecot: Nov 10 13:07:12 Error: IMAP(user8): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 10 13:07:12 Error: child 26323 (imap) killed with signal 6
At 1:19 PM -0500 11/10/06, bofh list wrote:
I am seeing the following errors with vanilla RC13 (two servers, nfs, mmap_disable=yes, lock_method=fnctl, Maildir). These only manifested after the upgrade to 1.0RC13
server1: dovecot: Nov 09 15:44:08 Error: IMAP(user1): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0) dovecot: Nov 09 15:44:08 Error: child 2920 (imap) killed with signal 6 dovecot: Nov 09 16:09:47 Error: IMAP(user2): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0)
Data point: I am seeing the same thing (ioloop.c line 22 assertion failure) with a very different system: MacOS X 10.4.8, mmap_disable=no (default), single machine, Maildir at ~/Maildir on the same internal disk as everything else, lock_method=fcntl.
Because this is a small informal test system I also have a little more info that may be relevant. Since rc7 (my first rc version) I have seen occasional incidents where a single account (the same one involved in my assertion failures today) gets one subdirectory locked with one or more temp.$PID.<random> files in it and a message in tmp, This account has multiple clients almost constantly logged in, one Eudora/Mac, one Outlook 2003/XP, one Versamail 3.5/PalmOS. Only Eudora moves anything anywhere, Outlook sits mostly idle, Versamail is scanning the Inbox every 15 minutes and sync's everything irregularly every few hours. Most mail arriving for this account is automatically moved to one of 3 IMAP folders, and about half of the rest is automatically stashed locally by the client and trashed on the server. In all cases, the jammed IMAP directory In all 4 cases of the hang, a simple stop and restart of Dovecot and all clients has failed to solve the hang, and the solutions have come from either full system reboots or cleaning out both the dovecot index files and the file in the tmp folder of the Maildir subdirectory that was hung. I believe this is connected because twice today around the same times as the assertion failures in the logs, I had clients time out while trying to synch IMAP folders, which had me suspecting the same sort of failure, only to work on another connection within 10 minutes.
--
Bill Cole
bill@scconsult.com
On Fri, 2006-11-10 at 13:19 -0500, bofh list wrote:
I am seeing the following errors with vanilla RC13 (two servers, nfs, mmap_disable=yes, lock_method=fnctl, Maildir). These only manifested after the upgrade to 1.0RC13
server1: dovecot: Nov 09 15:44:08 Error: IMAP(user1): file ioloop.c: line 22 (io_add): assertion failed: (fd >= 0)
Happens when client disconnects while IDLEing. Fixed: http://dovecot.org/list/dovecot-cvs/2006-November/006782.html
On Wed, 2006-11-08 at 20:46 +0100, Luigi Rosa wrote:
Timo Sirainen said the following On 7/11/2006 22:39:
Oh, right, the patch.
It happened again ad was more serious.
{two mail arrived from two different mailing lists, Thunderbirds moves them in their mailbox but...}
Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Fixed index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: first_recent_uid_lowwater 10272 -> 10271 Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords Nov 8 20:36:44 mail dovecot: child 12176 (imap) killed with signal 11
Do you use Dovecot's deliver to save the mails to INBOX, or something else? Do you have the Postfix mailbox opened while this happens? What if you change "Max number of server connections to cache" to 1 in the account settings -> server settings -> advanced?
I tried reproducing this for a while with Thunderbird, but couldn't get these kind of errors.
Anyway, testing what happens with this patch would also help:
http://dovecot.org/pipermail/dovecot/attachments/20061107/958172d0/attachmen...
On Sun, 2006-11-12 at 15:52 +0200, Timo Sirainen wrote:
Timo Sirainen said the following On 7/11/2006 22:39:
Oh, right, the patch.
It happened again ad was more serious.
{two mail arrived from two different mailing lists, Thunderbirds moves them in their mailbox but...}
Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Fixed index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: first_recent_uid_lowwater 10272 -> 10271 Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords Nov 8 20:36:44 mail dovecot: child 12176 (imap) killed with signal 11 .. I tried reproducing this for a while with Thunderbird, but couldn't get
On Wed, 2006-11-08 at 20:46 +0100, Luigi Rosa wrote: these kind of errors.
No, forget it. I was finally able to reproduce this myself after setting maildir_copy_with_hardlinks=yes. :) Now lets try to fix it..
On Sun, 2006-11-12 at 16:02 +0200, Timo Sirainen wrote:
On Sun, 2006-11-12 at 15:52 +0200, Timo Sirainen wrote:
Timo Sirainen said the following On 7/11/2006 22:39:
Oh, right, the patch.
It happened again ad was more serious.
{two mail arrived from two different mailing lists, Thunderbirds moves them in their mailbox but...}
Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Fixed index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: first_recent_uid_lowwater 10272 -> 10271 Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords Nov 8 20:36:44 mail dovecot: child 12176 (imap) killed with signal 11 .. I tried reproducing this for a while with Thunderbird, but couldn't get
On Wed, 2006-11-08 at 20:46 +0100, Luigi Rosa wrote: these kind of errors.
No, forget it. I was finally able to reproduce this myself after setting maildir_copy_with_hardlinks=yes. :) Now lets try to fix it..
Annoying. I tried to debug this for two hours, and it happened maybe 5 times, but after I had added enough debugging code it stopped doing it completely (even without the debugging code).
I guess I'll have to write some test program which tries to emulate Thunderbird and make this crash happen more easily.
I did anyway see that the index file gets about 13 of these keyword extensions in it. Originally I thought there'd be just one extra, but this is weird..
Timo Sirainen wrote:
On Sun, 2006-11-12 at 16:02 +0200, Timo Sirainen wrote:
On Sun, 2006-11-12 at 15:52 +0200, Timo Sirainen wrote:
Timo Sirainen said the following On 7/11/2006 22:39:
Oh, right, the patch. It happened again ad was more serious.
{two mail arrived from two different mailing lists, Thunderbirds moves them in their mailbox but...}
Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Fixed index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: first_recent_uid_lowwater 10272 -> 10271 Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords Nov 8 20:36:44 mail dovecot: child 12176 (imap) killed with signal 11 .. I tried reproducing this for a while with Thunderbird, but couldn't get
On Wed, 2006-11-08 at 20:46 +0100, Luigi Rosa wrote: these kind of errors. No, forget it. I was finally able to reproduce this myself after setting maildir_copy_with_hardlinks=yes. :) Now lets try to fix it..
Annoying. I tried to debug this for two hours, and it happened maybe 5 times, but after I had added enough debugging code it stopped doing it completely (even without the debugging code).
I guess I'll have to write some test program which tries to emulate Thunderbird and make this crash happen more easily.
I did anyway see that the index file gets about 13 of these keyword extensions in it. Originally I thought there'd be just one extra, but this is weird..
FWIW: I've set "maildir_copy_with_hardlinks = no" as a workaround here, and there's no discernable effect on performance.
The only effect I've been able to notice is that the index corruption problem goes away.
Perhaps just write this up as a known bug for 1.0 and explain the workaround?
-- Daniel Mikkelsen Copyleft Software AS
On 4.12.2006, at 15.54, Hr. Daniel Mikkelsen wrote:
Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords
Wonder if rc17 fixes this?
FWIW: I've set "maildir_copy_with_hardlinks = no" as a workaround
here, and there's no discernable effect on performance.The only effect I've been able to notice is that the index corruption problem goes away.
Perhaps just write this up as a known bug for 1.0 and explain the
workaround?
I think it's a workaround only because it slows down Dovecot enough
so this problem just doesn't happen anymore.
Timo Sirainen wrote:
Nov 8 20:36:44 mail dovecot: IMAP(lrosa): Corrupted index file /var/spool/mail2/lrosa/.Postfix/dovecot.index: Duplicate header extension keywords
Wonder if rc17 fixes this?
Yes, rc17 appears to solve the problems. I've had no reports since I upgraded from rc13 to rc17 and turned off workarounds.
FWIW: I've set "maildir_copy_with_hardlinks = no" as a workaround here, and there's no discernable effect on performance.
I think it's a workaround only because it slows down Dovecot enough so this problem just doesn't happen anymore.
That sounds likely. Dovecot feels much faster after rc17 and without the workarounds.
-- Daniel Mikkelsen Copyleft Software AS
Timo Sirainen said the following On 12/11/2006 14:52:
Do you use Dovecot's deliver to save the mails to INBOX, or something else?
I have just Dovecot and Postfix, nothing else.
Do you have the Postfix mailbox opened while this happens?
No.
What if you change "Max number of server connections to cache" to 1 in the account settings -> server settings -> advanced?
Just changed. Since the problem was happening with only two mailboxes (I have more than 60 mailboxes on that account, all of them with at least a Thunderbird rule an many with more traffic than Postfix list), I stopped dovecot, deleted all index and other files created by dovecot and restarted the daemon. I deleted the files a couple of hours ago and no error showed up, but it's too soon to be happy.
Anyway, testing what happens with this patch would also help: http://dovecot.org/pipermail/dovecot/attachments/20061107/958172d0/attachmen...
Installed the patch now and restarted dovecot.
Let's see what happens.
Ciao, luigi
-- / +--[Luigi Rosa]-- \
If you see an onion ring, answer it!
Timo Sirainen said the following On 7/11/2006 22:35:
I tried reproducing this "Duplicate header" error myself, but I couldn't. I also can't see anything in the code that would even allow the duplication to be created.
If it could help, the problem happens SOMETIMES when I receive a mail that triggers one of the Thunderbird filter rules. In that case Thunderbird moves the message from the INBOX to a folder and I get an error about the index of that folder (never got an error about the index of the INBOX).
Ironically it happens also with Dovecot mail list folder :)
Ciao, luigi
-- / +--[Luigi Rosa]-- \
Do you know the one... "All I ask is a tall ship... and a star to steer her by..." You could feel the wind at your back, about you... the sounds of the sea beneath you. And even if you take away the wind and the water, it's still the same. The ship is yours... you can feel her... and the stars are still there. --James Kirk, The Ultimate Computer
On Tue, 2006-11-07 at 22:45 +0100, Luigi Rosa wrote:
Timo Sirainen said the following On 7/11/2006 22:35:
I tried reproducing this "Duplicate header" error myself, but I couldn't. I also can't see anything in the code that would even allow the duplication to be created.
If it could help, the problem happens SOMETIMES when I receive a mail that triggers one of the Thunderbird filter rules. In that case Thunderbird moves the message from the INBOX to a folder and I get an error about the index of that folder (never got an error about the index of the INBOX).
That sounds like it could be caused by the bug that I fixed. For INBOX I guess you use something else than Dovecot's deliver to store the mails? Copying mails might cause the bug to appear. Of course, there could be some other bug also.. :)
On 11/8/06, Timo Sirainen <tss@iki.fi> wrote:
That sounds like it could be caused by the bug that I fixed. For INBOX I guess you use something else than Dovecot's deliver to store the mails? Copying mails might cause the bug to appear. Of course, there could be some other bug also.. :)
This is what triggers the the issue for me as well - I've got a set of filter rules for Tbird that copy messages into other folders from Inbox (and mark the copied ones there as Deleted - at least that's how it seems, since Outlook shows the Deleted messages in the Inbox).
Will try the patch later and let you know if it works.
-- Juha http://www.geekzone.co.nz/juha
Timo Sirainen said the following On 7/11/2006 23:18:
That sounds like it could be caused by the bug that I fixed. For INBOX I guess you use something else than Dovecot's deliver to store the mails?
Yes, Postfix is delivering the mails. I'll apply the patch and see what happens.
Ciao, luigi
-- / +--[Luigi Rosa]-- \
Any wire cut to specified length will be too short.
my problem (thread "Duplicate header extension keywords") was probably also caused by Thunderbird filter rules. No more problems so far after applying the patch.
Timo Sirainen wrote:
On Tue, 2006-11-07 at 22:45 +0100, Luigi Rosa wrote:
Timo Sirainen said the following On 7/11/2006 22:35:
I tried reproducing this "Duplicate header" error myself, but I couldn't. I also can't see anything in the code that would even allow the duplication to be created. If it could help, the problem happens SOMETIMES when I receive a mail that triggers one of the Thunderbird filter rules. In that case Thunderbird moves the message from the INBOX to a folder and I get an error about the index of that folder (never got an error about the index of the INBOX).
That sounds like it could be caused by the bug that I fixed. For INBOX I guess you use something else than Dovecot's deliver to store the mails? Copying mails might cause the bug to appear. Of course, there could be some other bug also.. :)
-- in diesem Sinne, Rainer
participants (7)
-
Bill Cole
-
bofh list
-
Hr. Daniel Mikkelsen
-
Juha Saarinen
-
Luigi Rosa
-
Rainer Ophoven
-
Timo Sirainen