[Dovecot] Problem with fts found against Dovecot hg; examples + trace attached
I've been wanting to try out Squat for full-text indexing for a while now, so I finally gave it a whirl. I did a fresh pull from hg. I have it enabled, but it's taking *forever* to do a query. (Using alpine, I issued a Select command for all messages with some word anywhere, which presumably caused the Squat indexer to run.) I think Dovecot is infinite looping.
I thought something was wrong when I attached strace to it for five minutes and saw no calls to read() or open(). I can reproduce the "select takes forever" on a tiny Maildir with 5 messages: http://paulproteus.acm.jhu.edu/bug-report/2007-10-21/broken.tar.gz
I've also attached the *.in and *.out files that I think are related to this IMAP conversation.
On a different IMAP conversation with the same "SEARCH takes forever" problem, I attached a gdb to the running Dovecot and got this backtrace:
(gdb) bt
#0 parse_next_body_to_boundary (ctx=0x85746e8, block_r=0xbf92325c)
at message-parser.c:332
#1 0x080c1b7d in message_parser_parse_next_block (ctx=0x85746e8,
block_r=0xbf92325c) at message-parser.c:717
#2 0xb7f582c2 in fts_mailbox_search_next_nonblock (ctx=0x811c0c0,
mail=0x811d380, tryagain_r=0xbf9232bb) at fts-storage.c:175
#3 0x0805f3e6 in cmd_search_more (cmd=0x80feec0) at cmd-search.c:74
#4 0x0805f768 in cmd_search_more_callback (cmd=0x80feec0) at
cmd-search.c:112
#5 0x080ce8eb in io_loop_handle_timeouts (ioloop=0x80fb250,
update_run_now=true) at ioloop.c:264
#6 0x080cf309 in io_loop_handler_run (ioloop=0x80fb250) at
ioloop-poll.c:159
#7 0x080ce7c8 in io_loop_run (ioloop=0x80fb250) at ioloop.c:303
#8 0x08068c9c in main (argc=135213400, argv=0xbf923464, envp=0xbf92346c)
at main.c:293
So I do know that it's using the fts system. Every few minutes, I've interrupted it in gdb, and I have gotten these messages back from gdb:
Program received signal SIGINT, Interrupt.
parse_next_body_to_boundary (ctx=0x85746e8, block_r=0xbf92325c)
at message-parser.c:332
332 if (data[i] == '\n') {
(gdb) cont
Continuing.
[wait a few minutes]
Program received signal SIGINT, Interrupt.
parse_next_body_to_boundary (ctx=0x85746e8, block_r=0xbf92325c)
at message-parser.c:331
331 for (; i < block_r->size; i++) {
(gdb) cont
Continuing.
[wait a few minutes]
Program received signal SIGINT, Interrupt.
parse_next_body_to_boundary (ctx=0x85746e8, block_r=0xbf92325c)
at message-parser.c:332
332 if (data[i] == '\n') {
The ctx and block_r are the same at each point I interrupt it. This sounds like an infinite loop to me if it can't exit this for minutes and minutes. Furthermore, for the past couple of minutes, when I do this:
(gdb) print block_r[0]
$8 = {part = 0x80f3158, hdr = 0x0,
data = 0x8240d08 "\n<DIV><BR></DIV>Perhaps, but it is important to note
that this is a dissenting<BR>opinion, not part of the actual text of the
Holt bill.<BR><BR><BR>--- In <A
href=\"mailto:TrueVote@yahoogroups.com\">Tru"...,
size = 8192}
I always get the same response back.
I've linked to a fairly small Maildir (five messages) that reproduces the problem for me. It's possible that only one of those messages is actually at fault; I haven't tried with any Maildir smaller than five messages.
Is there something more I can do to help fix this problem? I'm happy to do anything reasonable further if you deem it useful.
-- Asheesh.
-- Bolub's Fourth Law of Computerdom: Project teams detest weekly progress reporting because it so vividly manifests their lack of progress.
On Sun, 2007-10-21 at 00:56 -0700, Asheesh Laroia wrote:
I thought something was wrong when I attached strace to it for five minutes and saw no calls to read() or open(). I can reproduce the "select takes forever" on a tiny Maildir with 5 messages: http://paulproteus.acm.jhu.edu/bug-report/2007-10-21/broken.tar.gz
Thanks, fixed: http://hg.dovecot.org/dovecot/rev/d6b2343238f9
participants (2)
-
Asheesh Laroia
-
Timo Sirainen