At last, I was finally able to find time and run a backtrace, I've attached the file. If you don't remember (and hey, you can blame you after all this time) the problem is only with a couple (~10) of users among the hundreds of thousands currently using the service.
Please let me know if I should provide any other info, logs or whatever.
Dimos Alevizos
-------- Original Message -------- Subject: Re: [Dovecot] Mbox corruption - Inbox beginning with 'FFrom' or 'FrFrom' From: Timo Sirainen <tss@iki.fi> To: Dimos Alevizos <dalevizo@otenet.gr> CC: Dimitris Paouris <dpaou@otenet.gr>, Dovecot Mailing List <dovecot@dovecot.org> Date: 02/11/2013 01:16 μμ
Well, if that patch didn’t work, then the problem is elsewhere. There aren’t many other good possibilities left though.. How about adding this patch, it should be even safer than the previous one:
http://hg.dovecot.org/dovecot-2.2/rev/d3062d066593
On 30.10.2013, at 12.42, Dimos Alevizos <dalevizo@otenet.gr> wrote:
I'm afraid it doesn't seem to be working. I've compiled a patched 2.2.6 dovecot with the patch you sent and installed it in a production server (had to be 2.2.6 cause we've upgraded all the rest since I begun this thread months ago) and although we still have mbox corruptions (rarely as before) the server isn't crashing :
Oct 30 11:15:19 pop04 dovecot: pop3-login: Login: user=<artower@otenet.gr>, method=PLAIN, rip=85.72.232.35, lip=83.235.66.43, mpid=24419, secured, session=<+0ywxfHpIQBVSOgj> Oct 30 11:15:20 pop04 dovecot: pop3(artower@otenet.gr): Disconnected: Logged out top=0/0, retr=0/0, del=0/1336, size=471029518 Oct 30 11:19:12 pop04 dovecot: lmtp(2863, artower@otenet.gr): r7U3KnyhcFIvCwAAckDtvw: msgid=<E1VbRvh-00040e-Ol@cpmail.force24.dedicated.catalyst2.com>: size=17823 saved mail to INBOX Oct 30 11:33:12 pop04 dovecot: pop3-login: Login: user=<artower@otenet.gr>, method=PLAIN, rip=85.72.224.94, lip=83.235.66.43, mpid=600, secured, session=<vT2aBfLpxQBVSOBe> Oct 30 11:33:12 pop04 dovecot: pop3(artower@otenet.gr): Error: Syncing INBOX failed: Mailbox isn't a valid mbox file Oct 30 11:33:12 pop04 dovecot: pop3(artower@otenet.gr): Error: Couldn't init INBOX: Mailbox isn't a valid mbox file Oct 30 11:33:12 pop04 dovecot: pop3(artower@otenet.gr): Mailbox init failed top=0/0, retr=0/0, del=0/0, size=0 Oct 30 11:33:33 pop04 dovecot: lmtp(16314, artower@otenet.gr): Au4vIMqucFK6PwAAckDtvw: msgid=<004401ced552$bb5ecd70$321c6850$@planet.nl>: size=7975817 save failed to INBOX: Mailbox isn't a valid mbox file
Perhaps the patch is only valid for 2.1.16 and needs to be modified for 2.2.6 ?
Thank you for your time Dimos Alevizos
-------- Original Message -------- Subject: Re: [Dovecot] Mbox corruption - Inbox beginning with 'FFrom' or 'FrFrom' From: Timo Sirainen <tss@iki.fi> To: Dimos Alevizos <dalevizo@otenet.gr> CC: dovecot@dovecot.org, Dimitris Paouris <dpaou@otenet.gr> Date: 26/06/2013 06:59 μμ
It crashes one specific IMAP/POP3 session, so others are unaffected. The potential problems:
It might cause the user's mbox to become crashing constantly. so first crash -> client reconnects -> client attempts the same operation -> crash again. Then again, this might not happen, it depends.
The mbox file would probably become slightly more corrupted than normally, because it doesn't finish moving data around. No data should get actually lost, but some parts could become duplicated (e.g. some headers or even mails, possibly causing UID renumbering = redownloading).
So not ideal in production, but shouldn't be too bad either, especially if you just wait for the first crash and then immediately switch to the old unpatched version.
On 26.6.2013, at 10.21, Dimos Alevizos <dalevizo@otenet.gr> wrote:
Hi,
I haven't had the time to compile it yet, but a question just occurred. Given that it's so rare and we can't reproduce it on a dev server, how safe is this to use on a production server ? When you say "crash" you mean the whole dovecot server or that specific client's child ?
D.
-------- Original Message -------- Subject: Re: [Dovecot] Mbox corruption - Inbox beginning with 'FFrom' or 'FrFrom' From: dalevizo <dalevizo@otenet.gr> To: Timo Sirainen <tss@iki.fi> CC: dovecot@dovecot.org, Dimitris Paouris <dpaou@otenet.gr> Date: 24/06/2013 01:41 μμ
Thanx I'll try the patch as soon as possible and I'll let you know. It is indeed very rare. We're only seeing 4-5 corruptions in about 13 million logins per day. I've been trying to convince our design team that we should move to maildir, but the truth is that it's quite a change, and we're way too busy to deal with everything else AND a migration from mbox to maildir.
D.
On Mon 24/06/2013 13:16, Timo Sirainen wrote:
On 19.6.2013, at 16.00, Dimos Alevizos <dalevizo@otenet.gr> wrote:
> we're having some problems with our dovecot setup. > I've seen similar problems in the mailing list some years ago but alas wasn't able to find a solution. > > Our setup is as follows : > An MX farm (postfix) sends mails via LMTP to a director farm (dovecot 2.1.12) which proxies pop3/imap/lmtp traffic to a dovecot farm (dovecot 2.1.16). > All mailboxes and indexes are on NFS and all servers are Centos. > > The problem is that at times we see mailboxes (all of them are in mbox format) beginning with FFrom or FrFrom and of course dovecot says it's not a valid mbox file.
This is quite an old bug, but it happens rarely enough that I haven't been able to reproduce and fix it. Actually people hadn't complained about it for a long time now, so I had assumed it had somehow gotten fixed already.
With the attached debug patch it should crash instead of (completely) corrupting the mbox file. Debugging the resulting core file with gdb could be useful in figuring this out.
Although I wouldn't recommend mbox format for any big installation anyway..