[Dovecot] Dovecot 1.0-test45 indexing issues
We're currently on dovecot 0.99.10 with a few modifications (including one very nasty hack for the blank line at the beginning of the mailbox bug).
We're using mbox all around.
Everything works smoothly but we still ocasionally get corrupt mailboxes and indexes that need to be wiped out.
I setup 1.0-test45 on one of our mailservers to start testing it and ran into a few issues. The first one is if I try to use mail_read_mmaped=yes I get this error:
Error: POP3(sobo1): file istream-raw-mbox.c: line 410 (istream_raw_mbox_get_body_size): assertion failed: (rstream->mail_size != (uoff_t)-1)
for every mailbox.
If I switch to mail_read_mmaped=no everything works, but performance is extremely bad. I have a few test mailboxes with about 20k messages and if I toggle a message flag from unread to read, dovecot seems to re-read the entire mailbox up to that message (I saw this with strace).
Is this how supposed to work?? I don't see behavior like this on 0.99.10, and on a mailbox with any substantial number of messages it really slows things down.
Thanks for any input.
- Mike
On 22.9.2004, at 00:36, dovecot@spam.turbolink.net wrote:
If I switch to mail_read_mmaped=no everything works, but performance is extremely bad.
There shouldn't be hardly any performance difference with mmaped=no vs. yes, but I'll look into that problem..
I have a few test mailboxes with about 20k messages and if I toggle a message flag from unread to read, dovecot seems to re-read the entire mailbox up to that message (I saw this with strace).
You set the flag with Dovecot? It shouldn't do that (and doesn't with me). Did anything else modify the mbox at the same time, for example if new mail was just written to the mbox? What OS is this? What filesystem? Can I see the strace log?
On Wed, 22 Sep 2004, Timo Sirainen wrote:
On 22.9.2004, at 00:36, dovecot@spam.turbolink.net wrote:
If I switch to mail_read_mmaped=no everything works, but performance is extremely bad.
There shouldn't be hardly any performance difference with mmaped=no vs. yes, but I'll look into that problem..
That was meant to be all together, the performance issue is the one of re-reading the mailbox from the beginning.
I have a few test mailboxes with about 20k messages and if I toggle a message flag from unread to read, dovecot seems to re-read the entire mailbox up to that message (I saw this with strace).
You set the flag with Dovecot? It shouldn't do that (and doesn't with me). Did anything else modify the mbox at the same time, for example if new mail was just written to the mbox? What OS is this? What filesystem? Can I see the strace log?
Well, I clicked 'Mark as Unread' in Thunderbird.
This is on Linux 2.4.21, INBOX is on NFS. No new messages are delivered. Nothing else has the mailbox open or is even trying.
Here is the snippet of an ngrep log I captured as I hit 'Mark as Unread':
# T 192.168.196.53:143 -> 192.168.199.48:57262 [AP] 26 OK Idle completed... ## T 192.168.199.48:57262 -> 192.168.196.53:143 [AP] 27 uid store 5957 -Flags (\Seen)..
very large delay here, 60-90 seconds
## T 192.168.196.53:143 -> 192.168.199.48:57262 [AP]
- 5957 FETCH (FLAGS ())..27 OK Store completed...
T 192.168.199.48:57262 -> 192.168.196.53:143 [AP] 28 check.. ## T 192.168.196.53:143 -> 192.168.199.48:57262 [AP] 28 OK Check completed... # T 192.168.199.48:57262 -> 192.168.196.53:143 [AP] 29 IDLE.. ## T 192.168.196.53:143 -> 192.168.199.48:57262 [AP]
- idling..
I didn't save the strace but I can generate a new one. I did do a gdb backtrace, while it was re-reading:
#0 0x0808ab02 in i_strdup_until () #1 0x0806a833 in mbox_from_parse () #2 0x0806a07b in i_stream_create_raw_mbox () #3 0x0806a1d5 in istream_raw_mbox_get_body_size () #4 0x08067b37 in mbox_sync_seek () #5 0x080687a4 in mbox_sync_seek () #6 0x08069003 in mbox_sync_seek () #7 0x080694ca in mbox_sync () #8 0x0806439f in mbox_transaction_commit () #9 0x0807e3ea in mailbox_transaction_commit () #10 0x080562fb in cmd_store () #11 0x080565df in cmd_uid () #12 0x08056d5a in _client_reset_command () #13 0x08056e69 in _client_input () #14 0x0808c768 in io_loop_handler_run () #15 0x0808bead in io_loop_run () #16 0x0805d107 in main () #17 0x40037bb4 in __libc_start_main () from /lib/libc.so.6
- Mike
On 22.9.2004, at 01:17, dovecot@spam.turbolink.net wrote:
T 192.168.199.48:57262 -> 192.168.196.53:143 [AP] 27 uid store 5957 -Flags (\Seen)..
very large delay here, 60-90 seconds
Hmm. I can't really think of many reasons for this to happen.. Have you set mbox_dirty_syncs=no? If not, try it and see if it makes any difference?
Only reason why it would do rereading there is if the message wasn't at the expected position in the mbox file. And if no-one else modified the mbox, the only reason for it is if Dovecot had somehow corrupted the message offset in index file.
Your indexes are stored in NFS too? Have you set mmap_disable=yes then?
On Wed, 22 Sep 2004, Timo Sirainen wrote:
On 22.9.2004, at 01:17, dovecot@spam.turbolink.net wrote:
T 192.168.199.48:57262 -> 192.168.196.53:143 [AP] 27 uid store 5957 -Flags (\Seen)..
very large delay here, 60-90 seconds
Hmm. I can't really think of many reasons for this to happen.. Have you set mbox_dirty_syncs=no? If not, try it and see if it makes any difference?
I did try it both ways, because I wasn't sure whether yes or no would produce the behavior described in the example. It didn't seem to make much of a difference.
Only reason why it would do rereading there is if the message wasn't at the expected position in the mbox file. And if no-one else modified the mbox, the only reason for it is if Dovecot had somehow corrupted the message offset in index file.
How/where exactly could I check for these conditions? I'm pretty familiar with gdb but not with the new indexing code. I set a breakpoint on mbox_sync and tried to follow it through but I haven't been able to pinpoint exactly where the decision is made to re-read or not.
Your indexes are stored in NFS too? Have you set mmap_disable=yes then?
No, indexes are not on NFS. I was doing some performance testing with indexes on a local disk (XFS) and also with indexes in a ramdisk, but it occurred with both.
- Mike
On 22.9.2004, at 22:20, dovecot@spam.turbolink.net wrote:
Only reason why it would do rereading there is if the message wasn't at the expected position in the mbox file. And if no-one else modified the mbox, the only reason for it is if Dovecot had somehow corrupted the message offset in index file.
How/where exactly could I check for these conditions? I'm pretty familiar with gdb but not with the new indexing code. I set a breakpoint on mbox_sync and tried to follow it through but I haven't been able to pinpoint exactly where the decision is made to re-read or not.
mbox_sync_has_changed() does the first quick check. If it shows it's changed, the mbox is locked and the timestamp/size is again checked against indexes in mbox_sync_do().
With dirty syncing mbox_sync_do() first tries to check only for new mails (partial=true), if it fails then it reads everything.
In a couple of mailboxes here I've see a condition where messages get a header like this:
X-UID: 5X-Status: D
I discovered this after seeing this error in the log file: Error: IMAP(glass): mbox sync: UID inserted in the middle of mailbox /var/imaptree/glass01/Sent (3 > 2)
The message has this X-UID line and also has a seperate Status: RO header.
I'm only guessing that this line isn't supposed to be combined like this, as later messages in the same mbox have the X-UID and X-Status on seperate lines...
Any idea as to the cause?
- Mike
On 24.9.2004, at 00:52, dovecot@spam.turbolink.net wrote:
In a couple of mailboxes here I've see a condition where messages get a header like this:
X-UID: 5X-Status: D
Can you reproduce this problem easily? I haven't noticed this myself for a long time.
Anyway, please try if 1.0-test46 fixes it.
On Fri, 24 Sep 2004, Timo Sirainen wrote:
On 24.9.2004, at 00:52, dovecot@spam.turbolink.net wrote:
In a couple of mailboxes here I've see a condition where messages get a header like this:
X-UID: 5X-Status: D
Can you reproduce this problem easily? I haven't noticed this myself for a long time.
No, I am unable to reproduce the problem on demand but I have seen 2 mailboxes with this type of corruption.
I am apparently unable to reproduce the message re-reading problem either! I tried to setup some tracing the next day but everything worked exactly as it should (of course!). I suspect it may be an issue with the file timestamps between the NFS client and the NFS server.
Anyway, please try if 1.0-test46 fixes it.
will do.
- Mike
On Fri, 24 Sep 2004, Timo Sirainen wrote:
On 24.9.2004, at 00:52, dovecot@spam.turbolink.net wrote:
In a couple of mailboxes here I've see a condition where messages get a header like this:
X-UID: 5X-Status: D
Can you reproduce this problem easily? I haven't noticed this myself for a long time.
Anyway, please try if 1.0-test46 fixes it.
I haven't seen this one again with test46 but I did get this strange one today:
dovecot: Sep 27 14:44:19 Error: IMAP(test01): Corrupted transaction log file /tmp/dovecot/mailindex/test01/INBOX/dovecot.index.log: extra record update out of range (0 >= 0) dovecot: Sep 27 14:44:20 Error: IMAP(test01): Transaction log file /tmp/dovecot/mailindex/test01/INBOX/dovecot.index.log: marked corrupted
I've seen this on several different mboxes now always with IMAP.
- Mike
Here's another error message I just started getting with test46:
Error: POP3(test2): file mbox-lock.c: line 493 (mbox_lock): assertion failed: (lock_type == F_RDLCK ||ibox->mbox_lock_type != F_RDLCK)
This has happened on several different mboxes. Possibly relevant parts of dovecot.conf.
mbox_read_locks = fcntl mbox_write_locks = fcntl dotlock
I haven't seen the combined X-Status/X-UID nor the other log corruption "Append with UID xxx, but next_uid = xxx" that I saw earlier.
- Mike
On 28.9.2004, at 00:29, dovecot@spam.turbolink.net wrote:
Here's another error message I just started getting with test46:
Error: POP3(test2): file mbox-lock.c: line 493 (mbox_lock): assertion failed: (lock_type == F_RDLCK ||ibox->mbox_lock_type != F_RDLCK)
Could you get gdb backtrace from this? I know this happens when you try to copy a message into the same mbox, but why would anyone do that?
I haven't seen the combined X-Status/X-UID nor the other log corruption "Append with UID xxx, but next_uid = xxx" that I saw earlier.
That's good.
On Tue, 28 Sep 2004, Timo Sirainen wrote:
On 28.9.2004, at 00:29, dovecot@spam.turbolink.net wrote:
Here's another error message I just started getting with test46:
Error: POP3(test2): file mbox-lock.c: line 493 (mbox_lock): assertion failed: (lock_type == F_RDLCK ||ibox->mbox_lock_type != F_RDLCK)
Could you get gdb backtrace from this? I know this happens when you try to copy a message into the same mbox, but why would anyone do that?
I haven't been able to reproduce it, I've only seen it happen in the logs after the fact on a few uses mboxes. I did get a core dump last time:
#0 0x4004bc31 in kill () from /lib/libc.so.6 (gdb) bt #0 0x4004bc31 in kill () from /lib/libc.so.6 #1 0x4004b8b8 in raise () from /lib/libc.so.6 #2 0x4004d17c in abort () from /lib/libc.so.6 #3 0x0807f188 in i_set_failure_file () #4 0x0807ed63 in i_panic () #5 0x0805c653 in mbox_lock () #6 0x080604bf in mbox_sync () #7 0x0805ac2d in mbox_is_valid_mask () #8 0x08074fd1 in mailbox_close () #9 0x080528f1 in client_destroy () #10 0x080854ff in _o_stream_create () #11 0x08082dc4 in io_loop_handler_run () #12 0x08082509 in io_loop_run () #13 0x08053fba in main () #14 0x40037bb4 in __libc_start_main () from /lib/libc.so.6 (gdb)
I've only seen it happen using POP3, so I don't think it has to do with copying messages.
I did leave the new lazy writing turned on so maybe it's something in that new code?
- Mike
I'm not sure exactly what it means, but it repeats over and over again for 2 different boxes on my system:
Error: IMAP(trisha): Corrupted transaction log file /var/spool/mailindex/trisha/SPAM/dovecot.index.log: Append with UID 58332, but next_uid = 59582 Error: IMAP(trisha): Transaction log file /var/spool/mailindex/trisha/SPAM/dovecot.index.log: marked corrupted
It seems to repeatedly mark it corrupted but then never does anything about it?
Client is outlook express, and only dovecot 0.99.10 and 1.0-test45 have ever been used to access this mbox.
- Mike
participants (2)
-
dovecot@spam.turbolink.net
-
Timo Sirainen