[Dovecot] Mbox corruption - Inbox beginning with 'FFrom' or 'FrFrom'
Dimos Alevizos
dalevizo at otenet.gr
Sat Mar 8 09:57:39 UTC 2014
Kind reminder :-)
D.
On Thu 13/02/2014 15:52, Dimos Alevizos wrote:
> At last, I was finally able to find time and run a backtrace, I've
> attached the file.
> If you don't remember (and hey, you can blame you after all this
> time) the problem is only with a couple (~10) of users among the
> hundreds of thousands currently using the service.
>
> Please let me know if I should provide any other info, logs or whatever.
>
> Dimos Alevizos
>
> -------- Original Message --------
> Subject: Re: [Dovecot] Mbox corruption - Inbox beginning with
> 'FFrom' or 'FrFrom'
> From: Timo Sirainen <tss at iki.fi>
> To: Dimos Alevizos <dalevizo at otenet.gr>
> CC: Dimitris Paouris <dpaou at otenet.gr>, Dovecot Mailing List
> <dovecot at dovecot.org>
> Date: 02/11/2013 01:16 μμ
>
> >Well, if that patch didn’t work, then the problem is elsewhere. There aren’t many other good possibilities left though.. How about adding this patch, it should be even safer than the previous one:
> >
> >http://hg.dovecot.org/dovecot-2.2/rev/d3062d066593
> >
> >On 30.10.2013, at 12.42, Dimos Alevizos <dalevizo at otenet.gr> wrote:
> >
> >>I'm afraid it doesn't seem to be working.
> >>I've compiled a patched 2.2.6 dovecot with the patch you sent and installed it in a production server (had to be 2.2.6 cause we've upgraded all the rest since I begun this thread months ago) and although we still have mbox corruptions (rarely as before) the server isn't crashing :
> >>
> >>Oct 30 11:15:19 pop04 dovecot: pop3-login: Login: user=<artower at otenet.gr>, method=PLAIN, rip=85.72.232.35, lip=83.235.66.43, mpid=24419, secured, session=<+0ywxfHpIQBVSOgj>
> >>Oct 30 11:15:20 pop04 dovecot: pop3(artower at otenet.gr): Disconnected: Logged out top=0/0, retr=0/0, del=0/1336, size=471029518
> >>Oct 30 11:19:12 pop04 dovecot: lmtp(2863, artower at otenet.gr): r7U3KnyhcFIvCwAAckDtvw: msgid=<E1VbRvh-00040e-Ol at cpmail.force24.dedicated.catalyst2.com>: size=17823 saved mail to INBOX
> >>Oct 30 11:33:12 pop04 dovecot: pop3-login: Login: user=<artower at otenet.gr>, method=PLAIN, rip=85.72.224.94, lip=83.235.66.43, mpid=600, secured, session=<vT2aBfLpxQBVSOBe>
> >>Oct 30 11:33:12 pop04 dovecot: pop3(artower at otenet.gr): Error: Syncing INBOX failed: Mailbox isn't a valid mbox file
> >>Oct 30 11:33:12 pop04 dovecot: pop3(artower at otenet.gr): Error: Couldn't init INBOX: Mailbox isn't a valid mbox file
> >>Oct 30 11:33:12 pop04 dovecot: pop3(artower at otenet.gr): Mailbox init failed top=0/0, retr=0/0, del=0/0, size=0
> >>Oct 30 11:33:33 pop04 dovecot: lmtp(16314, artower at otenet.gr): Au4vIMqucFK6PwAAckDtvw: msgid=<004401ced552$bb5ecd70$321c6850$@planet.nl>: size=7975817 save failed to INBOX: Mailbox isn't a valid mbox file
> >>
> >>Perhaps the patch is only valid for 2.1.16 and needs to be modified for 2.2.6 ?
> >>
> >>Thank you for your time
> >>Dimos Alevizos
> >>
> >>
> >>-------- Original Message --------
> >>Subject: Re: [Dovecot] Mbox corruption - Inbox beginning with 'FFrom' or 'FrFrom'
> >>From: Timo Sirainen <tss at iki.fi>
> >>To: Dimos Alevizos <dalevizo at otenet.gr>
> >>CC: dovecot at dovecot.org, Dimitris Paouris <dpaou at otenet.gr>
> >>Date: 26/06/2013 06:59 μμ
> >>
> >>>It crashes one specific IMAP/POP3 session, so others are unaffected. The potential problems:
> >>>
> >>> * It might cause the user's mbox to become crashing constantly. so first crash -> client reconnects -> client attempts the same operation -> crash again. Then again, this might not happen, it depends.
> >>>
> >>> * The mbox file would probably become slightly more corrupted than normally, because it doesn't finish moving data around. No data should get actually lost, but some parts could become duplicated (e.g. some headers or even mails, possibly causing UID renumbering = redownloading).
> >>>
> >>>So not ideal in production, but shouldn't be too bad either, especially if you just wait for the first crash and then immediately switch to the old unpatched version.
> >>>
> >>>On 26.6.2013, at 10.21, Dimos Alevizos <dalevizo at otenet.gr> wrote:
> >>>
> >>>>Hi,
> >>>>
> >>>>I haven't had the time to compile it yet, but a question just occurred.
> >>>>Given that it's so rare and we can't reproduce it on a dev server, how safe is this to use on a production server ?
> >>>>When you say "crash" you mean the whole dovecot server or that specific client's child ?
> >>>>
> >>>>D.
> >>>>
> >>>>-------- Original Message --------
> >>>>Subject: Re: [Dovecot] Mbox corruption - Inbox beginning with 'FFrom' or 'FrFrom'
> >>>>From: dalevizo <dalevizo at otenet.gr>
> >>>>To: Timo Sirainen <tss at iki.fi>
> >>>>CC: dovecot at dovecot.org, Dimitris Paouris <dpaou at otenet.gr>
> >>>>Date: 24/06/2013 01:41 μμ
> >>>>
> >>>>>Thanx I'll try the patch as soon as possible and I'll let you know.
> >>>>>It is indeed very rare. We're only seeing 4-5 corruptions in about 13
> >>>>>million logins per day.
> >>>>>I've been trying to convince our design team that we should move to
> >>>>>maildir, but the truth is that it's quite a change, and we're way too
> >>>>>busy to deal with everything else AND a migration from mbox to maildir.
> >>>>>
> >>>>>D.
> >>>>>
> >>>>>On Mon 24/06/2013 13:16, Timo Sirainen wrote:
> >>>>>>On 19.6.2013, at 16.00, Dimos Alevizos <dalevizo at otenet.gr> wrote:
> >>>>>>
> >>>>>>>we're having some problems with our dovecot setup.
> >>>>>>>I've seen similar problems in the mailing list some years ago but alas wasn't able to find a solution.
> >>>>>>>
> >>>>>>>Our setup is as follows :
> >>>>>>>An MX farm (postfix) sends mails via LMTP to a director farm (dovecot 2.1.12) which proxies pop3/imap/lmtp traffic to a dovecot farm (dovecot 2.1.16).
> >>>>>>>All mailboxes and indexes are on NFS and all servers are Centos.
> >>>>>>>
> >>>>>>>The problem is that at times we see mailboxes (all of them are in mbox format) beginning with FFrom or FrFrom and of course dovecot says it's not a valid mbox file.
> >>>>>>
> >>>>>>This is quite an old bug, but it happens rarely enough that I haven't been able to reproduce and fix it. Actually people hadn't complained about it for a long time now, so I had assumed it had somehow gotten fixed already.
> >>>>>>
> >>>>>>With the attached debug patch it should crash instead of (completely) corrupting the mbox file. Debugging the resulting core file with gdb could be useful in figuring this out.
> >>>>>>
> >>>>>>Although I wouldn't recommend mbox format for any big installation anyway..
> >>>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> Feb 13 15:41:08 pop06 dovecot: imap-login: Login: user=<someuser at somedomain>, method=PLAIN, rip=195.170.0.170, lip=83.235.66.45, mpid=8157, session=<E/1110nyQADDqgCq>
> Feb 13 15:41:09 pop06 dovecot: imap(someuser at somedomain): Panic: file mbox-sync.c: line 655 (mbox_sync_handle_header): assertion failed: (mail_ctx->mail.from_offset + move_diff != 1 && mail_ctx->mail.from_offset + move_diff != 2)
> Feb 13 15:41:09 pop06 dovecot: imap(someuser at somedomain): Error: Raw backtrace: /opt/dovecot-2.2.9/lib/dovecot/libdovecot.so.0 [0x2adf551923fa] -> /opt/dovecot-2.2.9/lib/dovecot/libdovecot.so.0 [0x2adf55192466] -> /opt/dovecot-2.2.9/lib/dovecot/libdovecot.so.0 [0x2adf55191583] -> /opt/dovecot-2.2.9/lib/dovecot/libdovecot-storage.so.0 [0x2adf54e7e9a7] -> /opt/dovecot-2.2.9/lib/dovecot/libdovecot-storage.so.0(mbox_sync+0x620) [0x2adf54e7f160] -> /opt/dovecot-2.2.9/lib/dovecot/libdovecot-storage.so.0(mbox_storage_sync_init+0x83) [0x2adf54e80093] -> /opt/dovecot-2.2.9/lib/dovecot/libdovecot-storage.so.0(mailbox_sync_init+0x31) [0x2adf54e99da1] -> /opt/dovecot-2.2.9/lib/dovecot/libdovecot-storage.so.0(mailbox_sync+0x27) [0x2adf54e9abc7] -> dovecot/imap [someuser at somedomain 195.170.0.170 SELECT](cmd_select_full+0x219) [0x4111c9] -> dovecot/imap [someuser at somedomain 195.170.0.170 SELECT](cmd_select+0xb) [0x411afb] -> dovecot/imap [someuser at somedomain 195.170.0.170 SELECT](command_exec+0x37) [0x4165e7] -> dovecot/imap [d
> Feb 13 15:41:09 pop06 dovecot: imap(someuser at somedomain): Fatal: master: service(imap): child 8157 killed with signal 6 (core dumped)
>
> root at pop06:someuser $ gdb /opt/dovecot-2.2.9/libexec/imap core.8157
> GNU gdb (GDB) CentOS (7.0.1-45.el5.centos)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /opt/dovecot-2.2.9/libexec/imap...done.
> [New Thread 8157]
> Reading symbols from /opt/dovecot-2.2.9/lib/dovecot/libdovecot-storage.so.0...done.
> Loaded symbols for /opt/dovecot-2.2.9/lib/dovecot/libdovecot-storage.so.0
> Reading symbols from /opt/dovecot-2.2.9/lib/dovecot/libdovecot.so.0...done.
> Loaded symbols for /opt/dovecot-2.2.9/lib/dovecot/libdovecot.so.0
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /opt/dovecot-2.2.9/lib/dovecot/lib10_quota_plugin.so...done.
> Loaded symbols for /opt/dovecot-2.2.9/lib/dovecot/lib10_quota_plugin.so
> Reading symbols from /opt/dovecot-2.2.9/lib/dovecot/lib11_imap_quota_plugin.so...done.
> Loaded symbols for /opt/dovecot-2.2.9/lib/dovecot/lib11_imap_quota_plugin.so
> Reading symbols from /opt/dovecot-2.2.9/lib/dovecot/lib15_notify_plugin.so...done.
> Loaded symbols for /opt/dovecot-2.2.9/lib/dovecot/lib15_notify_plugin.so
> Reading symbols from /opt/dovecot-2.2.9/lib/dovecot/lib20_fts_plugin.so...done.
> Loaded symbols for /opt/dovecot-2.2.9/lib/dovecot/lib20_fts_plugin.so
> Reading symbols from /opt/dovecot-2.2.9/lib/dovecot/lib21_fts_solr_plugin.so...done.
> Loaded symbols for /opt/dovecot-2.2.9/lib/dovecot/lib21_fts_solr_plugin.so
> Reading symbols from /lib64/libexpat.so.0...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libexpat.so.0
> Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libgcc_s.so.1
>
> warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff49dfd000
> Core was generated by `dovecot/imap [someuser at somed'.
> Program terminated with signal 6, Aborted.
> #0 0x0000003efe030265 in raise () from /lib64/libc.so.6
> (gdb) bt full
> #0 0x0000003efe030265 in raise () from /lib64/libc.so.6
> No symbol table info available.
> #1 0x0000003efe031d10 in abort () from /lib64/libc.so.6
> No symbol table info available.
> #2 0x00002adf5519241e in default_fatal_finish (type=<value optimized out>, status=0) at failures.c:192
> backtrace = 0x14d90370 "/opt/dovecot-2.2.9/lib/dovecot/libdovecot.so.0 [0x2adf551923fa] -> /opt/dovecot-2.2.9/lib/dovecot/libdovecot.so.0 [0x2adf55192466] -> /opt/dovecot-2.2.9/lib/dovecot/libdovecot.so.0 [0x2adf55191583] ->"...
> #3 0x00002adf55192466 in i_internal_fatal_handler (ctx=0x7fff49d91340, format=<value optimized out>, args=<value optimized out>) at failures.c:653
> status = 0
> #4 0x00002adf55191583 in i_panic (format=0x1fdd <Address 0x1fdd out of bounds>) at failures.c:264
> ctx = {type = LOG_TYPE_PANIC, exit_status = 0, timestamp = 0x0}
> args = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0x7fff49d91410, reg_save_area = 0x7fff49d91350}}
> #5 0x00002adf54e7e9a7 in mbox_sync_handle_header (sync_ctx=0x7fff49d935e0, mail_ctx=0x7fff49d937d0, partial=true) at mbox-sync.c:654
> orig_from_offset = 14200194
> move_diff = -14200194
> ret = <value optimized out>
> __FUNCTION__ = "mbox_sync_handle_header"
> #6 mbox_sync_loop (sync_ctx=0x7fff49d935e0, mail_ctx=0x7fff49d937d0, partial=true) at mbox-sync.c:1138
> rec = 0x14dc0d20
> uid = <value optimized out>
> messages_count = 302
> offset = 14200194
> ret = <value optimized out>
> expunged = false
> skipped_mails = false
> uids_broken = false
> #7 0x00002adf54e7f160 in mbox_sync_int (mbox=0x14db5b30, flags=MBOX_SYNC_UNDIRTY) at mbox-sync.c:1619
> index_sync_ctx = 0x14dd3710
> sync_view = 0x14dd3780
> ret = <value optimized out>
> delay_writes = true
> readonly = false
> trans = 0x14dbf420
> sync_ctx = {mbox = 0x14db5b30, flags = MBOX_SYNC_UNDIRTY, input = 0x14dd4fe8, file_input = 0x14dd4e18, write_fd = 15, orig_mtime = 1392290551, orig_atime = 1391079897,
> orig_size = 29081201, last_stat = {st_dev = 25, st_ino = 30254297, st_nlink = 1, st_mode = 33152, st_uid = 1904649, st_gid = 550, pad0 = 0, st_rdev = 0, st_size = 29081201,
> st_blksize = 32768, st_blocks = 56920, st_atim = {tv_sec = 1391079897, tv_nsec = 0}, st_mtim = {tv_sec = 1392290551, tv_nsec = 0}, st_ctim = {tv_sec = 1392290551,
> tv_nsec = 961280000}, __unused = {0, 0, 0}}, index_sync_ctx = 0x14dd3710, sync_view = 0x14dd3780, t = 0x14dbf420, reset_hdr = {major_version = 0 '\000', minor_version = 0 '\000',
> base_header_size = 0, header_size = 0, record_size = 0, compat_flags = 0 '\000', unused = "\000\000", indexid = 0, flags = 0, uid_validity = 0, next_uid = 0, messages_count = 0,
> unused_old_recent_messages_count = 0, seen_messages_count = 0, deleted_messages_count = 0, first_recent_uid = 0, first_unseen_uid_lowwater = 0, first_deleted_uid_lowwater = 0,
> log_file_seq = 0, log_file_tail_offset = 0, log_file_head_offset = 0, unused_old_sync_size = 0, unused_old_sync_stamp = 0, day_stamp = 0, day_first_uid = {0, 0, 0, 0, 0, 0, 0, 0}},
> hdr = 0x14dbf090, header = 0x14dcb070, from_line = 0x14dcd100, base_uid_validity = 1387910574, base_uid_last = 439, base_uid_last_offset = 0, mails = {arr = {buffer = 0x14dcb0b0,
> element_size = 64}, v = 0x14dcb0b0, v_modifiable = 0x14dcb0b0}, sync_changes = 0x14dcb0f0, mail_keyword_pool = 0x14dbf780, saved_keywords_pool = 0x14de7480, prev_msg_uid = 242,
> next_uid = 440, idx_next_uid = 440, seq = 105, idx_seq = 105, need_space_seq = 0, last_nonrecent_uid = 0, expunged_space = 14200194, space_diff = 0, dest_first_mail = 1,
> first_mail_crlf_expunged = 0, keep_recent = 0, readonly = 0, delay_writes = 1, renumber_uids = 0, moved_offsets = 0, ext_modified = 0, index_reset = 0, errors = 0}
> sync_flags = <value optimized out>
> changed = <value optimized out>
> __FUNCTION__ = "mbox_sync_int"
> #8 mbox_sync (mbox=0x14db5b30, flags=MBOX_SYNC_UNDIRTY) at mbox-sync.c:2000
> lock_id = 3
> ret = <value optimized out>
> __FUNCTION__ = "mbox_sync"
> #9 0x00002adf54e80093 in mbox_storage_sync_init (box=0x14db5b30, flags=65) at mbox-sync.c:2049
> mbox = 0x1fdd
> mbox_sync_flags = 8157
> ret = 0
> #10 0x00002adf54e99da1 in mailbox_sync_init (box=0x14db5b30, flags=65) at mail-storage.c:1668
> _data_stack_cur_id = 4
> ctx = <value optimized out>
> #11 0x00002adf54e9abc7 in mailbox_sync (box=0x1fdd, flags=8157) at mail-storage.c:1716
> ctx = 0x14d99110
> status = {sync_delayed_expunges = 0}
> #12 0x00000000004111c9 in select_open (cmd=0x14db3310, readonly=false) at cmd-select.c:313
> No locals.
> #13 cmd_select_full (cmd=0x14db3310, readonly=false) at cmd-select.c:439
> client = 0x14db2b00
> ctx = 0x14db3408
> args = 0x14d9ba98
> list_args = 0x101010101010101
> mailbox = 0x14d902b8 "INBOX"
> error = <value optimized out>
> ret = <value optimized out>
> __FUNCTION__ = "cmd_select_full"
> ---Type <return> to continue, or q <return> to quit---
> #14 0x0000000000411afb in cmd_select (cmd=0x1fdd) at cmd-select.c:448
> No locals.
> #15 0x00000000004165e7 in command_exec (cmd=0x14db3310) at imap-commands.c:158
> hook = 0x14d99110
> ret = <value optimized out>
> #16 0x00000000004153db in client_command_input (cmd=0x1fdd) at imap-client.c:780
> client = 0x14db2b00
> command = <value optimized out>
> __FUNCTION__ = "client_command_input"
> #17 0x0000000000415495 in client_command_input (cmd=0x14db3310) at imap-client.c:841
> client = 0x14db2b00
> command = 0x6
> __FUNCTION__ = "client_command_input"
> #18 0x000000000041564f in client_handle_next_command (client=0x14db2b00) at imap-client.c:879
> No locals.
> #19 client_handle_input (client=0x14db2b00) at imap-client.c:891
> _data_stack_cur_id = 3
> remove_io = <value optimized out>
> handled_commands = false
> __FUNCTION__ = "client_handle_input"
> #20 0x0000000000415a52 in client_input (client=0x14db2b00) at imap-client.c:933
> cmd = <value optimized out>
> output = 0x14db0c00
> bytes = 19
> __FUNCTION__ = "client_input"
> #21 0x00002adf551a1676 in io_loop_call_io (io=0x14db0cd0) at ioloop.c:388
> ioloop = 0x14d98740
> t_id = 2
> #22 0x00002adf551a2c7d in io_loop_handler_run (ioloop=<value optimized out>) at ioloop-epoll.c:220
> ctx = <value optimized out>
> event = 0x14d9a250
> list = 0x14db0d20
> io = 0x1fdd
> tv = {tv_sec = 3, tv_usec = 790044}
> events_count = <value optimized out>
> msecs = <value optimized out>
> ret = 1
> i = 0
> call = false
> __FUNCTION__ = "io_loop_handler_run"
> #23 0x00002adf551a15f8 in io_loop_run (ioloop=0x14d98740) at ioloop.c:412
> __FUNCTION__ = "io_loop_run"
> #24 0x00002adf551528d3 in master_service_run (service=0x14d985d0, callback=0x1fdd) at master-service.c:566
> No locals.
> #25 0x000000000041ef07 in main (argc=2, argv=0x14d98390) at main.c:400
> login_set = {auth_socket_path = 0x14d90060 "\210", postlogin_socket_path = 0x14d900b8 "", postlogin_timeout_secs = 60, callback = 0x41f100 <login_client_connected>,
> failure_callback = 0x41f0a0 <login_client_failed>, request_auth_token = 1}
> service_flags = <value optimized out>
> storage_service_flags = MAIL_STORAGE_SERVICE_FLAG_DISALLOW_ROOT
> username = 0x0
> c = <value optimized out>
> set_roots = {0x426b20, 0x0}
> (gdb)
More information about the dovecot
mailing list