dovecot crash with Panic: file istream-header-filter.c: line 663
Hi -
I'm hitting a crash in dovecot, I get this logged followed by a terse stack trace and systemd-coredump details not included here - full gdb stack trace and more details are further down:
Mar 12 10:32:26 goffin dovecot[8269]: imap-login: Login: user=<patman>, method=PLAIN, rip=192.168.1.4, lip=192.168.1.1, mpid=8477, TLS, session=<5RvGYLf2RrDAqAEE>
Mar 12 10:32:26 goffin audit[8304]: USER_AUTH pid=8304 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=PAM:authentication grantors=pam_usertype,pam_localuser,pam_unix acct="patman" exe="/usr/libexec/dovecot/auth" hostname=192.168.1.4 addr=192.168.1.4 terminal=dovecot res=success'
Mar 12 10:32:26 goffin audit[8304]: USER_ACCT pid=8304 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="patman" exe="/usr/libexec/dovecot/auth" hostname=192.168.1.4 addr=192.168.1.4 terminal=dovecot res=success'
Mar 12 10:32:26 goffin dovecot[8269]: imap-login: Login: user=<patman>, method=PLAIN, rip=192.168.1.4, lip=192.168.1.1, mpid=8479, TLS, session=<PnHKYLf2SrDAqAEE>
Mar 12 10:32:27 goffin dovecot[8269]: imap(patman)<8452>
email client is fairemail version 1.2042a on android, but an older version had the same problem.
I mainly use fairemail to read emails on my phone, and that is working fine but I noticed that deleted emails were not actually getting deleted, that it has a bunch of "pending operations" it has were not running, and that dovecot has been crashing like this since about feb 2 (over a month ago). I can't figure out what changed at that time, but my cert says it should have expired over a year ago, but the cert as created 26 months ago.
log files only go back to Feb 5 so just missed that.
I generated a new cert but it's still crashing (I probably still need to do something on the client side or my setup is broken).
I assume there is something wrong or some "snapshot" thing that's stuck, but since it's crashing I can't figure out what the undelysing problem is.
Thanks for any help.
More details:
# dovecot --version 2.3.20 (80a5ac675d)
Running with up to date fedora core 37.
# dovecot -n # 2.3.20 (80a5ac675d): /etc/dovecot/dovecot.conf # OS: Linux 6.1.15-200.fc37.x86_64 x86_64 Fedora release 37 (Thirty Seven) # Hostname: specu mail_location = mbox:~/Mail:INBOX=~/Mail/inbox mbox_write_locks = fcntl namespace inbox { inbox = yes location = mailbox Drafts { special_use = \Drafts } mailbox Junk { special_use = \Junk } mailbox Sent { special_use = \Sent } mailbox "Sent Messages" { special_use = \Sent } mailbox Trash { special_use = \Trash } prefix = } passdb { driver = pam } ssl = required ssl_cert =
gdb backtrace via coredumpctl debug PID:
Core was generated by `dovecot/imap'.
Program terminated with signal SIGABRT, Aborted.
--Type <RET> for more, q to quit, c to continue without paging--
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007fa3840afec3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007fa38405fa76 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007fa3840497fc in __GI_abort () at abort.c:79
#4 0x00007fa3847541e3 in default_fatal_finish (status=0, type=LOG_TYPE_PANIC) at ../lib/failures.c:465
#5 fatal_handler_real (ctx=<optimized out>, format=<optimized out>, args=<optimized out>) at ../lib/failures.c:477
#6 0x00007fa3847ff5f7 in i_internal_fatal_handler (ctx=<optimized out>, format=<optimized out>, args=<optimized out>) at ../lib/failures.c:879
#7 0x00007fa3847540e1 in i_panic (format=0x7fa384844040 "file %s: line %d (%s): assertion failed: (%s)") at ../lib/failures.c:530
#8 0x00007fa38474fdf8 in i_stream_header_filter_snapshot_free (_snapshot=<optimized out>) at ../lib-mail/istream-header-filter.c:663
#9 i_stream_header_filter_snapshot_free (_snapshot=0x56150ea92250) at ../lib-mail/istream-header-filter.c:655
#10 0x00007fa38480cb6c in i_stream_snapshot_free (_snapshot=_snapshot@entry=0x56150ea9d2a0) at ../lib/istream.c:253
#11 0x00007fa38480cc14 in i_stream_unref (stream=0x56150ea9d338) at ../lib/istream.c:66
#12 0x00007fa3847dd179 in message_parse_header_deinit (_ctx=_ctx@entry=0x56150ea9d0c8) at ../lib-mail/message-header-parser.c:52
#13 0x00007fa3847deea1 in preparsed_parse_next_header (ctx=0x56150ea9d050, block_r=0x7ffc8222ca70) at ../lib-mail/message-parser-from-parts.c:310
#14 0x00007fa3847dfedc in message_parser_parse_next_block (ctx=ctx@entry=0x56150ea9d050, block_r=block_r@entry=0x7ffc8222ca70) at ../lib-mail/message-parser.c:838
#15 0x00007fa3847e0029 in message_parser_parse_header (ctx=0x56150ea9d050, hdr_size=hdr_size@entry=0x56150ea5ff00,
callback=callback@entry=0x7fa3849aa3f0
On Sun, Mar 12, 2023 at 11:17:43AM -0700, Patrick Mansfield wrote:
email client is fairemail version 1.2042a on android, but an older version had the same problem.
I mainly use fairemail to read emails on my phone, and that is working fine but I noticed that deleted emails were not actually getting deleted, that it has a bunch of "pending operations" it has were not running, and that dovecot has been crashing like this since about feb 2 (over a month ago). I can't figure out what changed at that time, but my cert says it should have expired over a year ago, but the cert as created 26 months ago.
log files only go back to Feb 5 so just missed that.
I generated a new cert but it's still crashing (I probably still need to do something on the client side or my setup is broken).
I assume there is something wrong or some "snapshot" thing that's stuck, but since it's crashing I can't figure out what the undelysing problem is.
Another thing I noticed:
I think I deleted all emails left in the "Trash" folder outside of imap on Feb 4 (using mutt locally) - or that that folder was modified on Feb 4, that is also the same timestamp as my ~/Mail/.imap directory, and is about the time I think it started crashing.
And that these were all dated Feb 4:
$ ls -ld ~/Mail/.imap ~/Mail/Trash ~/Mail/.imap/dovecot.list.index drwx------. 118 patman patman 4096 Feb 4 09:29 /home/patman/Mail/.imap -rw------- 1 patman patman 14696 Feb 4 09:11 /home/patman/Mail/.imap/dovecot.list.index -rw------- 1 patman patman 1459841 Feb 4 09:11 /home/patman/Mail/Trash
-- Patrick
On 12. Mar 2023, at 20.17, Patrick Mansfield patmans@yahoo.com wrote:
Mar 12 10:32:27 goffin dovecot[8269]: imap(patman)<8452>
: Panic: file istream-header-filter.c: line 663 (i_stream_header_filter_snapshot_free): assertion failed: (snapshot->mstream->snapshot_pending)
This is unfortunately rather difficult to debug. First you should find out which folder and mail this is happening in. You can do that with gdb:
#16 0x00007fa3849aa60a in index_mail_parse_headers_internal (mail=mail@entry=0x56150ea5fc78, headers=headers@entry=0x0) at index/index-mail-headers.c:465
fr 16 p mail.box.vname p mail.uid
Likely deleting that mail manually from the mbox will fix it. Of course, it would be nice if we were able to reproduce the bug also. Once you've found the broken folder, could you anonymize the mbox file contents and send it to me privately? https://github.com/dovecot/tools/blob/main/mbox-anonymize.pl can help you do it. Although I'm not sure if even that is enough to reproduce the bug - might need the dovecot.index* files also but those contain cached headers from the emails, which can be rather sensitive data.
Other things besides deleting the mail that might help, and would be useful to know whether they help:
- doveadm fetch -u user imap.bodystructure mailbox $folder uid $uid
- doveadm mailbox cache remove -u user mailbox $folder uid $uid
On Mon, Mar 13, 2023 at 11:01:35AM +0200, Timo Sirainen wrote:
On 12. Mar 2023, at 20.17, Patrick Mansfield patmans@yahoo.com wrote:
Mar 12 10:32:27 goffin dovecot[8269]: imap(patman)<8452>
: Panic: file istream-header-filter.c: line 663 (i_stream_header_filter_snapshot_free): assertion failed: (snapshot->mstream->snapshot_pending) This is unfortunately rather difficult to debug. First you should find out which folder and mail this is happening in. You can do that with gdb:
#16 0x00007fa3849aa60a in index_mail_parse_headers_internal (mail=mail@entry=0x56150ea5fc78, headers=headers@entry=0x0) at index/index-mail-headers.c:465
fr 16 p mail.box.vname p mail.uid
OK, I had to use a bit different syntax:
(gdb) fr 16 #16 0x00007fa3849aa60a in index_mail_parse_headers_internal (mail=mail@entry=0x56150ea5fc78, headers=headers@entry=0x0) at index/index-mail-headers.c:465 465 message_parser_parse_header(data->parser_ctx, &data->hdr_size, (gdb) p mail->mail.mail->box->vname $1 = 0x56150ea60840 "INBOX" (gdb) p mail->mail.mail->uid $2 = 33655
Likely deleting that mail manually from the mbox will fix it. Of course, it would be nice if we were able to reproduce the bug also. Once you've found the broken folder, could you anonymize the mbox file contents and send it to me privately? https://github.com/dovecot/tools/blob/main/mbox-anonymize.pl can help you do it. Although I'm not sure if even that is enough to reproduce the bug - might need the dovecot.index* files also but those contain cached headers from the emails, which can be rather sensitive data.
Other things besides deleting the mail that might help, and would be useful to know whether they help:
- doveadm fetch -u user imap.bodystructure mailbox $folder uid $uid
The above hits the same crash:
$ doveadm fetch -u patman imap.bodystructure mailbox INBOX uid 33655 Error: net_connect_unix(/run/dovecot/stats-writer) failed: Permission denied doveadm(patman): Panic: file istream-header-filter.c: line 663 (i_stream_header_filter_snapshot_free): assertion failed: (snapshot->mstream->snapshot_pending) doveadm(patman): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7fd12d418f46] -> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7fd12d419082] -> /usr/lib64/dovecot/libdovecot.so.0(+0x109557) [0x7fd12d423557] -> /usr/lib64/dovecot/libdovecot.so.0(+0x109597) [0x7fd12d423597] -> /usr/lib64/dovecot/libdovecot.so.0(+0x5e0e1) [0x7fd12d3780e1] -> /usr/lib64/dovecot/libdovecot.so.0(+0x59df8) [0x7fd12d373df8] -> /usr/lib64/dovecot/libdovecot.so.0(i_stream_snapshot_free+0x1c) [0x7fd12d430b6c] -> /usr/lib64/dovecot/libdovecot.so.0(i_stream_unref+0x54) [0x7fd12d430c14] -> /usr/lib64/dovecot/libdovecot.so.0(message_parse_header_deinit+0x19) [0x7fd12d401179] -> /usr/lib64/dovecot/libdovecot.so.0(+0xe8ea1) [0x7fd12d402ea1] -> /usr/lib64/dovecot/libdovecot.so.0(message_parser_parse_next_block+0x4c) [0x7fd12d403edc] -> /usr/lib64/dovecot/libdovecot.so.0(message_parser_parse_header+0x59) [0x7fd12d404029] -> /usr/lib64/dovecot/libdovecot-storage.so.0(index_mail_parse_headers_internal+0x11a) [0x7fd12d5ce60a] -> /usr/lib64/dovecot/libdovecot-storage.so.0(index_mail_parse_headers+0x4e) [0x7fd12d5ce71e] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0xdd21e) [0x7fd12d5d421e] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0xdd2b3) [0x7fd12d5d42b3] -> /usr/lib64/dovecot/libdovecot-storage.so.0(index_mail_get_special+0x20d) [0x7fd12d5d46cd] -> /usr/lib64/dovecot/libdovecot-storage.so.0(mail_get_special+0xe) [0x7fd12d553a1e] -> /usr/bin/doveadm(+0x3893c) [0x5650816db93c] -> /usr/bin/doveadm(+0x3e12e) [0x5650816e112e] -> /usr/bin/doveadm(+0x3bba1) [0x5650816deba1] -> /usr/bin/doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x29b) [0x5650816dfe0b] -> /usr/bin/doveadm(doveadm_cmd_run_ver2+0x4ff) [0x5650816ea67f] -> /usr/bin/doveadm(doveadm_cmd_try_run_ver2+0x3b) [0x5650816ea6fb] -> /usr/bin/doveadm(main+0x282) [0x5650816cbd92] -> /lib64/libc.so.6(+0x27510) [0x7fd12cb6a510] -> /lib64/libc.so.6(__libc_start_main+0x89) [0x7fd12cb6a5c9] -> /usr/bin/doveadm(_start+0x25) [0x5650816cbfa5] Aborted (core dumped)
- doveadm mailbox cache remove -u user mailbox $folder uid $uid
Ran (not as root):
$ doveadm mailbox cache remove -u patman mailbox INBOX uid 33655 Error: net_connect_unix(/run/dovecot/stats-writer) failed: Permission denied mailbox uid result INBOX 33655 ok
After the above, it's no longer crashing, and my email client's "pending operations" have cleared.
And this now works:
$ doveadm fetch -u patman imap.bodystructure mailbox INBOX uid 33655 Error: net_connect_unix(/run/dovecot/stats-writer) failed: Permission denied imap.bodystructure: ("text" "plain" ("charset" "utf-8") NIL NIL "8bit" 5555 107 NIL NIL NIL NIL)("text" "html" ("charset" "utf-8") NIL NIL "8bit" 34300 270 NIL NIL NIL NIL) "alternative" ("boundary" "n1UvlD2eFkGX=_?:") NIL NIL NIL
I didn't get the anonimized mbox first, so I can't get any more details than what I have in the existing core dump, but the above seems to imply the cache was the issue one way or another.
Thanks!
-- Patrick
On Mon, Mar 13, 2023 at 04:33:38PM +0100, Gerald Galster wrote:
After the above, it's no longer crashing, and my email client's "pending operations" have cleared.
Does your server use ECC memory and if so, are there any errors logged (bitflip, ...)?
Best regards, Gerald
I don't have the logs from that time them nor do I see any hardware / memory errors.
I also haven't had any other odd failures.
But how can I tell if I have ECC memory or not?
-- Patrick
After the above, it's no longer crashing, and my email client's "pending operations" have cleared.
Does your server use ECC memory and if so, are there any errors logged (bitflip, ...)?
Best regards, Gerald
I don't have the logs from that time them nor do I see any hardware / memory errors.
I also haven't had any other odd failures.
But how can I tell if I have ECC memory or not?
You could install dmidecode and search for ECC (not L1/L2 cpu cache), e.g.
Physical Memory Array Location: System Board Or Motherboard Use: System Memory Error Correction Type: Multi-bit ECC Maximum Capacity: 128 GB Error Information Handle: 0x0008 Number Of Devices: 4
It happens rarely but without ECC those errors often go without notice.
With ECC dmesg/kernel log might show warnings like
kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error. kernel: EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#2channel#1 ... kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD kernel: core: [Hardware Error]: Machine check events logged kernel: [Hardware Error]: Corrected error, no action required.
Best regards, Gerald
participants (3)
-
Gerald Galster
-
Patrick Mansfield
-
Timo Sirainen