[Dovecot] Crash in v1.2.3: istream.c: assertion failed on line 99
Hi, I have a couple of people bumping into an issue with their imap process crashing - from the users perspective, their mail client (thunderbird) just stops recieving new mail. It still seems to be possible for them to read their mail using squirell mail. When it crashes, it leaves behind a log message like this on the server:
Aug 12 15:52:25 fury dovecot: imap-login: Login: user=<zhi>, method=PLAIN, rip=10.8.5.15, lip=10.10.0.26 Aug 12 15:52:25 fury dovecot: IMAP(zhi): Panic: file istream.c: line 99 (i_stream_read): assertion failed: ((size_t)ret+old_size == _stream->pos
- _stream->skip) Aug 12 15:52:25 fury dovecot: IMAP(zhi): Raw backtrace: imap [0x49ddc0] -> imap [0x49de23] -> imap [0x49d4a6] -> imap [0x4a252e] -> imap [0x4a4691] -> imap(i_stream_read+0x48) [0x4a23f8] -> imap(i_stream_read_data+0x28) [0x4a2558] -> imap [0x4ab22e] -> imap(o_stream_send_istream+0x2e) [0x4aa63e] -> imap(maildir_save_continue+0x32) [0x443d12] -> imap(mail_storage_copy+0x6a) [0x462a1a] -> imap(maildir_copy+0x5e) [0x441d2e] -> imap(cmd_copy+0x1fe) [0x41a94e] -> imap(cmd_uid+0x78) [0x41f8d8] -> imap [0x42053c] -> imap [0x4205f2] -> imap(client_handle_input+0x3f) [0x42072f] -> imap(client_input+0x5f) [0x42127f] -> imap(io_loop_handler_run+0xf8) [0x4a5b08] -> imap(io_loop_run+0x1d) [0x4a4c1d] -> imap(main+0x620) [0x428b80] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x353e41d974] -> imap [0x4198f9] Aug 12 15:52:25 fury dovecot: dovecot: child 14759 (imap) killed with signal 6 (core dumps disabled)
I was able to fairly reliably trigger the same behaviour myself by attempting to move a bunch (thousands) of emails in my account using thunderbird - I dont know exactly what it is that triggers the crash though. I tried enabling core dumps using 'ulimit -c unlimited' and messing around with some environment variables to stop the startup scripts from forcing core dumps off again (DAEMON_COREFILE_LIMIT=18446744073709551615 /etc/init.d/dovecot restart). After doing that, I do not see anything in the logs any more - but I still seem to be having problems moving the emails around in my account. It would seem that it is still broken but Im just not getting any logs telling me so. There don't seem to be any core dumps sitting around anywhere either.
My 'dovecot -n' is: # 1.2.3: /etc/dovecot.conf # OS: Linux 2.6.18-128.1.16.el5.centos.plus x86_64 CentOS release 5.3 (Final) xfs syslog_facility: local0 shutdown_clients: no login_dir: /var/run/dovecot/login login_executable(default): /usr/libexec/dovecot/imap-login login_executable(imap): /usr/libexec/dovecot/imap-login login_executable(pop3): /usr/libexec/dovecot/pop3-login max_mail_processes: 1024 first_valid_uid: 300 mail_location: maildir:/u/%u/Maildir mail_drop_priv_before_exec: yes mail_executable(default): /usr/libexec/dovecot/imap mail_executable(imap): /usr/libexec/dovecot/imap mail_executable(pop3): /usr/libexec/dovecot/pop3 mail_plugin_dir(default): /usr/lib64/dovecot/imap mail_plugin_dir(imap): /usr/lib64/dovecot/imap mail_plugin_dir(pop3): /usr/lib64/dovecot/pop3 lda: postmaster_address: example@example.com auth default: mechanisms: gssapi gss-spnego plain krb5_keytab: /etc/opt/quest/vas/imap.keytab gssapi_hostname: $ALL passdb: driver: passwd-file args: /etc/dovecot.deny # empty file deny: yes passdb: driver: pam args: dovecot userdb: driver: passwd userdb: driver: static args: home=/u/%u
-- Thanks, Phill Macey
On 8/12/2009, Phillip Macey (phillip.macey@cisra.canon.com.au) wrote:
I was able to fairly reliably trigger the same behaviour myself by attempting to move a bunch (thousands) of emails in my account using thunderbird
What version of TBird?
The upcoming v3 is supposed to be bringing a *lot* of IMAP fixes... can you try it to see if this is at least limited to 2.0.0.x?
--
Best regards,
Charles
On Thu, 2009-08-13 at 12:10 +1000, Phillip Macey wrote:
Aug 12 15:52:25 fury dovecot: IMAP(zhi): Panic: file istream.c: line 99 (i_stream_read): assertion failed: ((size_t)ret+old_size == _stream->pos
- _stream->skip)
This is kind of annoying problem. I've been adding more of these asserts to try to catch it, but no one's really found an easy way to reproduce it.
If you manage to get a core dump, it would also help to see what those values are:
fr 6 p ret p old_size p _stream.pos p _stream.skip
(You may need to compile Dovecot without -O2 parameter to get some of those values. Easiest way would be to just remove -O2 from src/lib/Makefile; touch src/lib/istream.c; make; sudo make install)
imap(i_stream_read_data+0x28) [0x4a2558] -> imap [0x4ab22e] -> imap(o_stream_send_istream+0x2e) [0x4aa63e] -> imap(maildir_save_continue+0x32) [0x443d12] -> imap(mail_storage_copy+0x6a) [0x462a1a] -> imap(maildir_copy+0x5e)
This is another thing I'm wondering. It's copying the mails by actually copying the data, instead of doing a hard link. But your dovecot -n output shows that you have it set to defaults, so it should have used a hard link. Or did you copy to a directory that was in another mount point? Or doesn't your filesystem support hard links?
I was able to fairly reliably trigger the same behaviour myself by attempting to move a bunch (thousands) of emails in my account using thunderbird - I dont know exactly what it is that triggers the crash though.
If you're still able to reproduce this, can you also try if this does it:
telnet localhost 143 a login user pass b select inbox c create testbox d copy 1:* testbox
Instead of getting a core dump you can also attach gdb to an already running process: gdb -p <imap process pid>
On 14/08/2009 9:00 AM, Timo Sirainen wrote:
On Thu, 2009-08-13 at 12:10 +1000, Phillip Macey wrote:
Aug 12 15:52:25 fury dovecot: IMAP(zhi): Panic: file istream.c: line 99 (i_stream_read): assertion failed: ((size_t)ret+old_size == _stream->pos
- _stream->skip)
This is kind of annoying problem. I've been adding more of these asserts to try to catch it, but no one's really found an easy way to reproduce it.
Til now. It would seem that I can reproduce it at will - except that it doesnt get logged if I have core dumps enabled :-) It could be specific to my maildir or an email in my maildir?
If you manage to get a core dump, it would also help to see what those values are:
fr 6 p ret p old_size p _stream.pos p _stream.skip
(You may need to compile Dovecot without -O2 parameter to get some of those values. Easiest way would be to just remove -O2 from src/lib/Makefile; touch src/lib/istream.c; make; sudo make install)
Is this what you are after? I got to this point using telnet to send a copy command as you suggested below and then attaching gdb to the process and stepping through.
0x080f2c2c 337 while (ioloop->running) (gdb) 338 io_loop_handler_run(ioloop); (gdb) 0x080f2c23 338 io_loop_handler_run(ioloop); (gdb)
Program received signal SIGABRT, Aborted. 0xb7f55402 in __kernel_vsyscall () (gdb) Aug 14 11:26:36 eyck dovecot: IMAP(pmacey): Panic: file istream.c: line 99 (i_stream_read): assertion failed: ((size_t)ret+old_size == _stream->pos - _stream->skip) Aug 14 11:26:36 eyck dovecot: IMAP(pmacey): Raw backtrace: imap [0x80eb000] -> imap [0x80eb05a] -> imap [0x80ea8ac] -> imap [0x80f0098] -> imap [0x80f25ca] -> imap(i_stream_read+0x3f) [0x80eff3f] -> imap(i_stream_read_data+0x1d) [0x80f00bd] -> imap [0x80e341f] -> imap [0x80e393e] -> imap(message_parser_parse_next_block+0x1d) [0x80e355d] -> imap(index_mail_cache_parse_continue+0x22) [0x809d3b2] -> imap(maildir_save_continue+0x45) [0x808ab55] -> imap(mail_storage_copy+0x6d) [0x80ac86d] -> imap(maildir_copy+0x4b) [0x80888db] -> imap(cmd_copy+0x1e0) [0x805e760] -> imap [0x80647ac] -> imap [0x806485b] -> imap(client_handle_input+0x33) [0x80649b3] -> imap(client_input+0x5f) [0x80654ff] -> imap(io_loop_handler_run+0x100) [0x80f3b40] -> imap(io_loop_run+0x28) [0x80f2c28] -> imap(main+0x738) [0x806da48] -> /lib/libc.so.6(__libc_start_main+0xdc) [0x58ae8c] -> imap [0x805d591] bt #0 0xb7f55402 in __kernel_vsyscall () #1 0x0059dd80 in raise () from /lib/libc.so.6 #2 0x0059f691 in abort () from /lib/libc.so.6 #3 0x080eb00d in default_fatal_finish (type=<value optimized out>, status=0) at failures.c:160 #4 0x080eb05a in i_internal_fatal_handler (type=LOG_TYPE_PANIC, status=0, fmt=0x8102a90 "file %s: line %d (%s): assertion failed: (%s)", args=0xbfa95a34 "Kb\021\bc") at failures.c:440 #5 0x080ea8ac in i_panic (format=0x8102a90 "file %s: line %d (%s): assertion failed: (%s)") at failures.c:207 #6 0x080f0098 in i_stream_read (stream=0x9b284c0) at istream.c:99 #7 0x080f25ca in i_stream_tee_read (stream=0x9b28578) at istream-tee.c:130 #8 0x080eff3f in i_stream_read (stream=0x9b285a0) at istream.c:80 #9 0x080f00bd in i_stream_read_data (stream=0x9b285a0, data_r=0xbfa95bb8, size_r=0xbfa95bbc, threshold=1) at istream.c:361 #10 0x080e341f in message_parser_read_more (ctx=0x9b2fda0, block_r=0xbfa95bb0, full_r=0xbfa95b4f) at message-parser.c:118 #11 0x080e393e in parse_next_body_to_eof (ctx=0x9b2fda0, block_r=0xbfa95bb0) at message-parser.c:412 #12 0x080e355d in message_parser_parse_next_block (ctx=0x9b2fda0, block_r=0xbfa95bb0) at message-parser.c:768 #13 0x0809d3b2 in index_mail_cache_parse_continue (_mail=0x9b29ca0) at index-mail.c:1367 #14 0x0808ab55 in maildir_save_continue (_ctx=0x9b28698) at maildir-save.c:426 #15 0x080ac86d in mail_storage_copy (ctx=0x9b28698, mail=0x9b23cb0) at mail-copy.c:40 #16 0x080888db in maildir_copy (ctx=0x9b28698, mail=0x9b23cb0) at maildir-copy.c:282 #17 0x0805e760 in cmd_copy (cmd=0x9ac0ea0) at cmd-copy.c:73 #18 0x080647ac in client_command_input (cmd=0x9ac0ea0) at client.c:611 #19 0x0806485b in client_command_input (cmd=0x9ac0ea0) at client.c:660 #20 0x080649b3 in client_handle_input (client=0x9ac0c10) at client.c:701 #21 0x080654ff in client_input (client=0x9ac0c10) at client.c:752 #22 0x080f3b40 in io_loop_handler_run (ioloop=0x9abf298) at ioloop-epoll.c:208 #23 0x080f2c28 in io_loop_run (ioloop=0x9abf298) at ioloop.c:338 #24 0x0806da48 in main (argc=Cannot access memory at address 0x3c10 ) at main.c:327 (gdb) fr 6 #6 0x080f0098 in i_stream_read (stream=0x9b284c0) at istream.c:99 99 i_assert((size_t)ret+old_size == _stream->pos - _stream->skip); (gdb) p ret $4 = 15376 (gdb) p old_size No symbol "old_size" in current context. (gdb) p _stream $5 = (struct istream_private *) 0x9b28498 (gdb) p _stream.pos $6 = 8191 (gdb) p _stream.skip $7 = 0 (gdb)
imap(i_stream_read_data+0x28) [0x4a2558] -> imap [0x4ab22e] -> imap(o_stream_send_istream+0x2e) [0x4aa63e] -> imap(maildir_save_continue+0x32) [0x443d12] -> imap(mail_storage_copy+0x6a) [0x462a1a] -> imap(maildir_copy+0x5e)
This is another thing I'm wondering. It's copying the mails by actually copying the data, instead of doing a hard link. But your dovecot -n output shows that you have it set to defaults, so it should have used a hard link. Or did you copy to a directory that was in another mount point? Or doesn't your filesystem support hard links?
Mail storage is on an XFS filesystem - it should support hard links. I have explicitly set maildir_copy_with_hardlinks=yes in the config file rather than just relying on the defaults because I was certain that I wanted it turned on. I just jumped onto the server and created a hardlink by hand without any problems. 'ls -lni' looks like it works:
22972186 -rw-r--r-- 2 30332 60 81 Jul 13 16:38 dot.forward 22972186 -rw-r--r-- 2 30332 60 81 Jul 13 16:38 dot.forward.hardlink 1575505744 drwx------ 99 30332 60 8192 Aug 14 09:08 Maildir
I was able to fairly reliably trigger the same behaviour myself by attempting to move a bunch (thousands) of emails in my account using thunderbird - I dont know exactly what it is that triggers the crash though.
If you're still able to reproduce this, can you also try if this does it:
Doing this also triggers the crash. See the gdb output above. The only thing I did differently was that 'testbox' already existed and had some emails in it so I didnt do the 'create'.
telnet localhost 143 a login user pass b select inbox c create testbox d copy 1:* testbox
Instead of getting a core dump you can also attach gdb to an already running process: gdb -p <imap process pid>
-- Thanks, Phill Macey (CiSRA IT Services)
On Aug 13, 2009, at 10:20 PM, Phillip Macey wrote:
Aug 12 15:52:25 fury dovecot: IMAP(zhi): Panic: file istream.c:
line 99 (i_stream_read): assertion failed: ((size_t)ret+old_size
== _stream->pos - _stream->skip)This is kind of annoying problem. I've been adding more of these
asserts to try to catch it, but no one's really found an easy way
to reproduce it.Til now. It would seem that I can reproduce it at will - except that
it doesnt get logged if I have core dumps enabled :-) It could be
specific to my maildir or an email in my maildir?
Yes, it's very likely specific to your maildir. Can you anonymize it
in some way that the crash still happens but you could also send me
the mails? http://dovecot.org/tools/mbox-anonymize.pl does it to
mboxes, looks like it should work for maildir files also as long as
you put $hdr=1; to beginning.
(You may need to compile Dovecot without -O2 parameter to get some
of those values. Easiest way would be to just remove -O2 from src/ lib/Makefile; touch src/lib/istream.c; make; sudo make install) Is this what you are after? I got to this point using telnet to send a copy command as you suggested below and then attaching gdb to the process and stepping through.
If you stepping through you mean you hit enter some thousand times,
you could have also used "continue" command. :)
#6 0x080f0098 in i_stream_read (stream=0x9b284c0) at istream.c:99 #7 0x080f25ca in i_stream_tee_read (stream=0x9b28578) at istream- tee.c:130
Oh, it's coming from tee_read().. I thought tee-istream was the problem.
(gdb) p ret $4 = 15376 (gdb) p old_size No symbol "old_size" in current context.
ok, so this was dropped out by -O2
(gdb) p _stream $5 = (struct istream_private *) 0x9b28498 (gdb) p _stream.pos $6 = 8191 (gdb) p _stream.skip $7 = 0 (gdb)
But still, there is only 8 kB available but read() returned 15376
bytes. That can't be correct. Could you do one more thing:
fr 6 p *stream p *stream.real_stream p *(struct file_istream *)stream.real_stream
Mail storage is on an XFS filesystem - it should support hard links. I have explicitly set maildir_copy_with_hardlinks=yes in the config file rather than just relying on the defaults because I was certain that I wanted it turned on.
For some reason that's not working. I can look into that also, but I
also really want to get that istream crash fixed. :)
Til now. It would seem that I can reproduce it at will - except that it doesnt get logged if I have core dumps enabled :-) It could be specific to my maildir or an email in my maildir? Yes, it's very likely specific to your maildir. Can you anonymize it in some way that the crash still happens but you could also send me
On Aug 13, 2009, at 10:20 PM, Phillip Macey wrote: the mails? http://dovecot.org/tools/mbox-anonymize.pl does it to mboxes, looks like it should work for maildir files also as long as you put $hdr=1; to beginning. Sorry - I cant do that. Im sure the perl script works quite well, but I cant send you any of the emails in the maildir. For a start, there are 10k emails in the source folder and >20k in the destination. What I can do is start excluding emails from the copy and see when it starts working. Maybe I can narrow it down to a particular email and then we can take it from there. I have had a play around and here are some
On 14/08/2009 12:29 PM, Timo Sirainen wrote: things that I noticed: permanent storage? fluke?).
- It seems to be an attribute of the destination folder. I can successfully move the mail into a new (empty) folder. As soon as I try to move into that specific destination, it starts crashing.
- The other thing that I think is a pattern - using thunderbird, if I select all the emails in the folder (eg Ctrl-A) and try to move them - it crashes. If I hand select a smaller subset of the emails, they tend to move successfully. The smaller the number the more likely it would be successful. I tried to figure out if there was a point at which it stopped working - but it seemed like the number kept on getting smaller as I kept trying. By the end, I was not even able to move a single email without getting a crash (restarting dovecot did not change this -
- If I move the mail into another temp folder (no problems doing this) and then try moving from the temp folder into the 'problem' folder, I still get the crash.
If you stepping through you mean you hit enter some thousand times, you could have also used "continue" command. :) I think it was only 15 or 20 times. I didnt attach gdb to the process until just before I did the copy. Maybe that means the crash happened before it got around to reading all the emails or it was something in the first couple of mails that triggered it.
#6 0x080f0098 in i_stream_read (stream=0x9b284c0) at istream.c:99 #7 0x080f25ca in i_stream_tee_read (stream=0x9b28578) at istream-tee.c:130 Oh, it's coming from tee_read().. I thought tee-istream was the problem. (gdb) p ret $4 = 15376 (gdb) p old_size No symbol "old_size" in current context. ok, so this was dropped out by -O2 Sorry, I thought I had got rid of that. (gdb) p _stream $5 = (struct istream_private *) 0x9b28498 (gdb) p _stream.pos $6 = 8191 (gdb) p _stream.skip $7 = 0 (gdb) But still, there is only 8 kB available but read() returned 15376 bytes. That can't be correct. Could you do one more thing:
fr 6 p *stream p *stream.real_stream p *(struct file_istream *)stream.real_stream
(gdb) fr 6
#6 0x080f0098 in i_stream_read (stream=0x9b284c0) at istream.c:99
99 i_assert((size_t)ret+old_size == _stream->pos -
_stream->skip);
(gdb) p *stream
$13 = {v_offset = 745472, stream_errno = 0, mmaped = 0, blocking = 1,
closed = 0, readable_fd = 0, seekable = 0, eof = 0, real_stream = 0x9b28498}
(gdb) p *stream.real_stream
$14 = {iostream = {refcount = 1, close = 0x8100330
Mail storage is on an XFS filesystem - it should support hard links. I have explicitly set maildir_copy_with_hardlinks=yes in the config file rather than just relying on the defaults because I was certain that I wanted it turned on. For some reason that's not working. I can look into that also, but I also really want to get that istream crash fixed. :) No problem - the hardlinking bit doesnt really break anything for us so its not as big an issue anyway. If we hadnt bumped into this bug, I probably wouldnt even have known it was not working :-)
-- Thanks, Phill Macey (CiSRA IT Services)
On Fri, 2009-08-14 at 16:38 +1000, Phillip Macey wrote:
Maybe I can narrow it down to a particular email and then we can take it from there.
Yeah, that would be good.
- It seems to be an attribute of the destination folder. I can successfully move the mail into a new (empty) folder. As soon as I try to move into that specific destination, it starts crashing.
It's most likely because of the caching decisions. You can use idxview to get a list of them, e.g.
idxview ~/Maildir/.dest ..
-- Cache fields -- # Name Type Size Dec Last used 0: flags bit 4 tmp 2009-08-13 12:20 1: date.sent fix 8 tmp 2009-08-07 18:22 2: date.received fix 4 yes 2009-08-13 14:51 .. 8: hdr.FROM hdr - tmp 2009-08-13 18:37 ..
If you duplicate the caching decisions, or at least add some caching decisions, that might be enough to cause it to crash. You can add those decisions simply by fetching that data. For example:
1 select mailbox 2 fetch 1 (body.peek[header.fields (from to)] bodystructure internaldate)
- The other thing that I think is a pattern - using thunderbird, if I select all the emails in the folder (eg Ctrl-A) and try to move them - it crashes. If I hand select a smaller subset of the emails, they tend to move successfully. The smaller the number the more likely it would be successful. I tried to figure out if there was a point at which it stopped working - but it seemed like the number kept on getting smaller as I kept trying. By the end, I was not even able to move a single email without getting a crash
Sounds like when destination mailbox got larger it got easier to crash? Although I don't think that should have happened..
(restarting dovecot did not change this - permanent storage? fluke?).
Restarting Dovecot rarely helps with anything. All state is stored in index files.
(gdb) p ret $4 = 15376
Wonder if this could also be wrong and caused by gdb being confused, because -O2 was used. I don't see any way how this could happen with the current code.
See if this helps: http://hg.dovecot.org/dovecot-1.2/rev/4c42e72a3954
Also you could check if you have any mails with CRs in them:
perl -ne 's/\r$//;print if /\r/' *
On 15/08/2009 3:17 AM, Timo Sirainen wrote:
Sorry about the long wait for a reply. The weekend and life away from a computer/work called :-)
See if this helps: http://hg.dovecot.org/dovecot-1.2/rev/4c42e72a3954
This seems to fix it - the process does not crash any more when I use either telnet or thunderbird. I can confirm that the mails were actually copied from source to destination ;-) I reverted back to an earlier version to make sure and the crashes returned. Once I went back to the latest version, it all started working again without crashes.
Thanks Timo! Should I consider this fixed. Do you still want more info about anything?
Also you could check if you have any mails with CRs in them:
perl -ne 's/\r$//;print if /\r/' *
This didn't print anything - I was sort of expecting it to at least print something (especially since your update seemed to fix the crash...)
-- Thanks, Phill Macey (CiSRA IT Services)
On Aug 17, 2009, at 1:31 AM, Phillip Macey wrote:
Also you could check if you have any mails with CRs in them:
perl -ne 's/\r$//;print if /\r/' *
This didn't print anything - I was sort of expecting it to at least
print something (especially since your update seemed to fix the
crash...)
Hmm. That is strange. What about in the destination mailbox where you
copied the messages? I'm now wondering if there's still a data
corruption bug.
On 17/08/2009 3:36 PM, Timo Sirainen wrote:
On Aug 17, 2009, at 1:31 AM, Phillip Macey wrote:
Also you could check if you have any mails with CRs in them:
perl -ne 's/\r$//;print if /\r/' *
This didn't print anything - I was sort of expecting it to at least print something (especially since your update seemed to fix the crash...)
Hmm. That is strange. What about in the destination mailbox where you copied the messages? I'm now wondering if there's still a data corruption bug.
Ok, I don't know what I did wrong the first time around - this time it printed a whole heap of stuff for both the source and destination mailboxes. Sorry about the false alarm.
-- Thanks, Phill Macey (CiSRA IT Services)
On Aug 17, 2009, at 2:01 AM, Phillip Macey wrote:
On 17/08/2009 3:36 PM, Timo Sirainen wrote:
On Aug 17, 2009, at 1:31 AM, Phillip Macey wrote:
Also you could check if you have any mails with CRs in them:
perl -ne 's/\r$//;print if /\r/' *
This didn't print anything - I was sort of expecting it to at
least print something (especially since your update seemed to fix
the crash...)Hmm. That is strange. What about in the destination mailbox where
you copied the messages? I'm now wondering if there's still a data
corruption bug.Ok, I don't know what I did wrong the first time around - this time
it printed a whole heap of stuff for both the source and destination
mailboxes. Sorry about the false alarm.
Great, thanks. Time for v1.2.4 tomorrow then I guess. :)
participants (3)
-
Charles Marcus
-
Phillip Macey
-
Timo Sirainen