[Dovecot] Corrupt mbox folders after migration from uw-imap
After migrating our mail server from uw-imap to dovecot at the weekend we've had a few corrupted mbox folders. I assume this is due to messages with bad Content-length headers. Is there any way of checking the Content-length of messages in mbox folders?
Thanks
Ian
On September 11, 2006 3:16:27 PM +1000 Ian Mortimer ian@physics.uq.edu.au wrote:
After migrating our mail server from uw-imap to dovecot at the weekend we've had a few corrupted mbox folders. I assume this is due to messages with bad Content-length headers. Is there any way of checking the Content-length of messages in mbox folders?
Use formail (part of procmail) to extract the messages as individual files (mh style). Then it's trivial to either check, recompute, or just remove the Content-length headers. Make sure you only do this for the "main" message, not any Content-Length text in the body or any attachments. Then recombine each message into an mbox file.
-frank
On Mon, 2006-09-11 at 10:37 -0700, Frank Cusack wrote:
Use formail (part of procmail) to extract the messages as individual files (mh style). Then it's trivial to either check, recompute, or just remove the Content-length headers. Make sure you only do this for the "main" message, not any Content-Length text in the body or any attachments. Then recombine each message into an mbox file.
Thanks. With formail and procmail I was able to do what you suggest. It found a few anomalies which I now have to investigate.
However it seems that's not the problem. An inbox which passed the "Content-Length:" test is getting corrupted every time the top (oldest) message is deleted. Either not all of the message gets deleted leaving some lines at the top of the file before the first From line or else part of the next message (usually just the From and a few other chars) gets deleted.
Even weirder: if I fix the corrupted file, it immediately becomes corrupted again if the new top message is deleted. Any other messages can be deleted but not the top one.
The client was using squirrelmail.
Anybody seen anything like this?
-- Ian
On September 12, 2006 1:48:12 PM +1000 Ian Mortimer ian@physics.uq.edu.au wrote:
On Mon, 2006-09-11 at 10:37 -0700, Frank Cusack wrote: However it seems that's not the problem. An inbox which passed the "Content-Length:" test is getting corrupted every time the top (oldest) message is deleted. Either not all of the message gets deleted leaving some lines at the top of the file before the first From line or else part of the next message (usually just the From and a few other chars) gets deleted.
Even weirder: if I fix the corrupted file, it immediately becomes corrupted again if the new top message is deleted. Any other messages can be deleted but not the top one.
The client was using squirrelmail.
Anybody seen anything like this?
Is NFS involved?
-frank
On September 12, 2006 3:15:58 PM +1000 Ian Mortimer ian@physics.uq.edu.au wrote:
On Mon, 2006-09-11 at 22:13 -0700, Frank Cusack wrote:
Is NFS involved?
No NFS.
hmm well I've had similar problems if the dovecot index got corrupted somehow. Try removing the index (the .imap directory in the same folder as the bad mbox file) and see if that helps ... if not, someone who actually knows something about dovecot will have to chime in.
-frank
On Mon, 2006-09-11 at 22:39 -0700, Frank Cusack wrote:
hmm well I've had similar problems if the dovecot index got corrupted somehow. Try removing the index (the .imap directory in the same folder as the bad mbox file) and see if that helps ...
In all these cases (so far) the corrupt folder is the inbox in the mail spool. The index file is in the user's home directory ~/mail/.imap/INBOX (which is mounted over NFS from the file server).
Removing that doesn't fix this problem. However I have found the problem with this particular INBOX at least. There are some messages with corrupt or missing From lines near the top of the folder. Deleting those fixes this inbox.
What I'm concerned about is: are we going to see more of these?
if not, someone who actually knows something about dovecot will have to chime in.
That's what I was hoping for!
-- Ian
On September 13, 2006 8:39:07 AM +1000 Ian Mortimer ian@physics.uq.edu.au wrote:
On Mon, 2006-09-11 at 22:39 -0700, Frank Cusack wrote:
hmm well I've had similar problems if the dovecot index got corrupted somehow. Try removing the index (the .imap directory in the same folder as the bad mbox file) and see if that helps ...
In all these cases (so far) the corrupt folder is the inbox in the mail spool. The index file is in the user's home directory ~/mail/.imap/INBOX (which is mounted over NFS from the file server).
That sounds suspicious. Are you sure that the clock on the NFS server and on the mail frontend are in sync?
-frank
On Tue, 2006-09-12 at 15:59 -0700, Frank Cusack wrote:
That sounds suspicious. Are you sure that the clock on the NFS server and on the mail frontend are in sync?
Yes, ntpd is running (and working properly) on both.
All the problem messages I've discovered so far (the ones causing corruption) were received prior to the change to dovecot. So it looks (so far) like dovecot is not responsible for the suspect messages: maybe it's just less tolerant than uw-imap.
As far as I can tell the suspect messages only cause problems when the top (oldest) message in the folder is deleted. Not sure if later versions of dovecot might handle that better.
-- Ian
Ian Mortimer wrote:
As far as I can tell the suspect messages only cause problems when the top (oldest) message in the folder is deleted. Not sure if later versions of dovecot might handle that better.
In my case dovecot 1.0 was even worse. The older (0.99) version worked with some From_ headers, that 1.0 is not. I was complaining about that here - http://dovecot.org/list/dovecot/2006-August/015539.html
Dovecot is very restrictive on the message header format. UW-IMAP is much more tolerant.
FiL.
FiL @ Kpoxa wrote:
Dovecot is very restrictive on the message header format. UW-IMAP is much more tolerant.
The mbox From_ line doesn't support a full email address. It supposed to be the bare envelope sender (just the portion between <> in your example). UW-IMAP is not following the standards (as much as mbox format has a standard); Dovecot is following the mbox standard (and it should take all of a minute to write a Perl script to fix up those lines).
John
-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4501 Forbes Boulevard Suite H Lanham, MD 20706 301-459-3366 x.5010 fax 301-429-5748
Hello,
I know this is agains mbox standard, but the problem is that I have over 100Gb of mail in mbox format, that IS working now. It was working under uw-imap, it was working under dovecot 0.99. And when I tried to switch over to dovecot 1.0 - boom. Not working. Am not sure why do I have this From_ lines in these mboxes. It's possible that I have them there from the time our mail was on VAX machine. But again, is that right to make a new version of some software that doesn't work with the data from the older (not much older, just previous) version of the program? Shouldn't that be the part of the 1.0 code, that will rewrite bad From_ lines to the right ones? Not because this is the right way to deal with "bad" data, but just because this is the upgrada of something, that WAS WORKING.
FiL
John Peacock wrote:
FiL @ Kpoxa wrote:
Dovecot is very restrictive on the message header format. UW-IMAP is much more tolerant.
The mbox From_ line doesn't support a full email address. It supposed to be the bare envelope sender (just the portion between <> in your example). UW-IMAP is not following the standards (as much as mbox format has a standard); Dovecot is following the mbox standard (and it should take all of a minute to write a Perl script to fix up those lines).
John
Ian Mortimer wrote:
On Mon, 2006-09-11 at 22:39 -0700, Frank Cusack wrote:
hmm well I've had similar problems if the dovecot index got corrupted somehow. Try removing the index (the .imap directory in the same folder as the bad mbox file) and see if that helps ...
In all these cases (so far) the corrupt folder is the inbox in the mail spool. The index file is in the user's home directory ~/mail/.imap/INBOX (which is mounted over NFS from the file server).
FYI: on dovecot, you can specify an alternative location for the index files, e.g. /var/cache/dovecot/indexes/<username>
if that location is on a disk with no quota, you solve the nasty problem with indexes in over quota situations
Removing that doesn't fix this problem. However I have found the problem with this particular INBOX at least. There are some messages with corrupt or missing From lines near the top of the folder. Deleting those fixes this inbox.
What I'm concerned about is: are we going to see more of these?
if not, someone who actually knows something about dovecot will have to chime in.
That's what I was hoping for!
correction:
Magnus Stenman wrote:
Ian Mortimer wrote:
On Mon, 2006-09-11 at 22:39 -0700, Frank Cusack wrote: ...
mail spool. The index file is in the user's home directory ~/mail/.imap/INBOX (which is mounted over NFS from the file server).
FYI: on dovecot, you can specify an alternative location for the index files, e.g. /var/cache/dovecot/indexes/<username>
on dovecot 1.x
if that location is on a disk with no quota, you solve the nasty problem with indexes in over quota situations
...
On Wed, 2006-09-13 at 16:18 +0200, Magnus Stenman wrote:
FYI: on dovecot, you can specify an alternative location for the index files, e.g. /var/cache/dovecot/indexes/<username>
Good suggestion which I'll keep in mind.
on dovecot 1.x
An upgrade soon to 1.x is looking likely given the problems we're having with 0.99.
Thanks
Ian
I have seen the same thing.
dovecot 0.99, CentOS 4, squirrelmail (not verified that it affects squirrel only users)
I usually just delete the extra line in vi.
have since switched to dovecot 1.0, so far so good....
Ian Mortimer wrote:
On Mon, 2006-09-11 at 10:37 -0700, Frank Cusack wrote:
Use formail (part of procmail) to extract the messages as individual files (mh style). Then it's trivial to either check, recompute, or just remove the Content-length headers. Make sure you only do this for the "main" message, not any Content-Length text in the body or any attachments. Then recombine each message into an mbox file.
Thanks. With formail and procmail I was able to do what you suggest. It found a few anomalies which I now have to investigate.
However it seems that's not the problem. An inbox which passed the "Content-Length:" test is getting corrupted every time the top (oldest) message is deleted. Either not all of the message gets deleted leaving some lines at the top of the file before the first From line or else part of the next message (usually just the From and a few other chars) gets deleted.
Even weirder: if I fix the corrupted file, it immediately becomes corrupted again if the new top message is deleted. Any other messages can be deleted but not the top one.
The client was using squirrelmail.
Anybody seen anything like this?
On Tue, 2006-09-12 at 19:40 +0200, Magnus Stenman wrote:
I have seen the same thing.
dovecot 0.99, CentOS 4, squirrelmail (not verified that it affects squirrel only users)
dovecot 0.99, RHEL4. It's not just affecting squirrelmail users here. Both pop and imap clients are affected. randomly (it seems) at an average of around 1 or 2 a day.
I usually just delete the extra line in vi.
That sounds slightly dangerous. Do you lock the Inbox while your doing that?
In our case it's not just a single extra line. There's usually a couple of lines (blank and some with a few characters) before the first From line.
I checked (again) that both procmail (LDA) and dovecot are creating and honouring lock files. dotlocks at least seem to be working perfectly (and getting removed after 180s as configured in dovecot).
I've run out of ideas.
have since switched to dovecot 1.0, so far so good....
I'm getting desperate enough to try that also.
-- Ian
Ian Mortimer wrote:
On Tue, 2006-09-12 at 19:40 +0200, Magnus Stenman wrote:
I have seen the same thing.
dovecot 0.99, CentOS 4, squirrelmail (not verified that it affects squirrel only users)
dovecot 0.99, RHEL4. It's not just affecting squirrelmail users here. Both pop and imap clients are affected. randomly (it seems) at an average of around 1 or 2 a day.
I usually just delete the extra line in vi.
That sounds slightly dangerous. Do you lock the Inbox while your doing that?
no. vi will warn me if the file has changed since reading it, that is good enough for me
In our case it's not just a single extra line. There's usually a couple of lines (blank and some with a few characters) before the first From line.
I checked (again) that both procmail (LDA) and dovecot are creating and honouring lock files. dotlocks at least seem to be working perfectly (and getting removed after 180s as configured in dovecot).
in my case the delivery agent is postfix virtual, but i suspect the problem occurs when dovecot is rewriting the mailbox
I've run out of ideas.
have since switched to dovecot 1.0, so far so good....
I'm getting desperate enough to try that also.
I use a slightly modified dovecot RPM from ATrpms.net
pretty painless migration and as a bonus you get better out-of-quota behavior and better logging (among other things)
/m
On September 13, 2006 12:39:39 AM +0200 Magnus Stenman stone@hkust.se wrote:
Ian Mortimer wrote:
On Tue, 2006-09-12 at 19:40 +0200, Magnus Stenman wrote:
I usually just delete the extra line in vi.
That sounds slightly dangerous. Do you lock the Inbox while your doing that?
no. vi will warn me if the file has changed since reading it, that is good enough for me
So "almost no protection" against corruption is "good enough" for you, then?
Assuming no NFS (in which case the problem is worse), the vim warning is only useful if mail is delivered while you edit. You can still be writing out the file and then have a new mail be delivered before your vim write is complete. (or if vim uses a tmp file and does an atomic rename, you can lose mail instead of corrupting the mbox)
-frank
Frank Cusack wrote:
On September 13, 2006 12:39:39 AM +0200 Magnus Stenman stone@hkust.se wrote:
Ian Mortimer wrote:
On Tue, 2006-09-12 at 19:40 +0200, Magnus Stenman wrote:
I usually just delete the extra line in vi.
That sounds slightly dangerous. Do you lock the Inbox while your doing that?
no. vi will warn me if the file has changed since reading it, that is good enough for me
So "almost no protection" against corruption is "good enough" for you, then?
yes.
Since this happens on low volume mailboxes maybe every 2 months I don't worry about that tiny window.
If I feel paranoid, I run a tail|grep on the maillog and make sure no mail is being recieved for that user.
Assuming no NFS (in which case the problem is worse), the vim warning is only useful if mail is delivered while you edit. You can still be writing out the file and then have a new mail be delivered before your vim write is complete. (or if vim uses a tmp file and does an atomic rename, you can lose mail instead of corrupting the mbox)
-frank
I managed to create the simplest possible folder that shows the problem. It has just two messages in it. If I delete the top message the first 6 characters (From .) get removed from the From line of the next message leaving the folder corrupt.
This is on RHEL4 x86_64, so before posting it to bugzilla I decided to test it on RHEL4 i386 with the save dovecot version: dovecot-0.99.11-4.EL4.
This time, no corruption of the inbox when the first message is deleted.
So I tested another problem folder and got the same result: corruption on 64 bit, no corruption on 32 bit. Posted as Bugzilla Bug 206376
-- Ian
Ian Mortimer wrote:
I managed to create the simplest possible folder that shows the problem. It has just two messages in it. If I delete the top message the first 6 characters (From .) get removed from the From line of the next message leaving the folder corrupt.
This is on RHEL4 x86_64, so before posting it to bugzilla I decided to test it on RHEL4 i386 with the save dovecot version: dovecot-0.99.11-4.EL4.
This time, no corruption of the inbox when the first message is deleted.
So I tested another problem folder and got the same result: corruption on 64 bit, no corruption on 32 bit. Posted as Bugzilla Bug 206376
my problems have all been on i386
On Thu, 2006-09-14 at 11:03 +0200, Magnus Stenman wrote:
my problems have all been on i386
I've now got a folder which fails on i386 as well as x86_64 (but not in exactly the same way).
I've also found a way to fix all the problem folders seen so far. Remove the X-UID headers and the problem disappears!
-- Ian
On Mon, 2006-09-18 at 12:03 +1000, Ian Mortimer wrote:
I've also found a way to fix all the problem folders seen so far. Remove the X-UID headers and the problem disappears!
Just to wrap this thread up. After moving the indexes from the user's (NFS mounted) home directories to a local disk on the mail server all these problems have disappeared. I haven't seen any new corrupt folders since then and known problem folders no longer become corrupt when the top message is deleted.
As a side effect this has also fixed a problem one user was having with threaded views in squirrelmail.
-- Ian
Ian Mortimer wrote:
I managed to create the simplest possible folder that shows the problem. It has just two messages in it. If I delete the top message the first 6 characters (From .) get removed from the From line of the next message leaving the folder corrupt.
This is on RHEL4 x86_64, so before posting it to bugzilla I decided to test it on RHEL4 i386 with the save dovecot version: dovecot-0.99.11-4.EL4.
No offense, but I think you are wasting your time trying to debug 0.99.x versions. I don't think Timo is even bothering patching it anymore. There have been so many improvements to the 1.0.x series - you'd be much better off spending your time testing and implementing an upgrade.
Just my .02 clad coins worth...
--
Best regards,
Charles
On Thu, 2006-09-14 at 09:02 -0400, Charles Marcus wrote:
I managed to create the simplest possible folder that shows the problem. It has just two messages in it. If I delete the top message the first 6 characters (From .) get removed from the From line of the next message leaving the folder corrupt.
This is on RHEL4 x86_64, so before posting it to bugzilla I decided to test it on RHEL4 i386 with the save dovecot version: dovecot-0.99.11-4.EL4.
No offense, but I think you are wasting your time trying to debug 0.99.x versions. I don't think Timo is even bothering patching it anymore. There have been so many improvements to the 1.0.x series - you'd be much better off spending your time testing and implementing an upgrade.
Has anyone built RPMs for the current version that update the older fedora/RHEL/Centos distributed installations painlessly?
-- Les Mikesell lesmikesell@gmail.com
On Thu, 2006-09-14 at 09:02 -0400, Charles Marcus wrote:
No offense, but I think you are wasting your time trying to debug 0.99.x versions. I don't think Timo is even bothering patching it anymore. There have been so many improvements to the 1.0.x series - you'd be much better off spending your time testing and implementing an upgrade.
Proabably true. However 0.99.11 is the version officially distributed and supported by RedHat so any information I can get about that is worth posting to bugzilla.
(I'm also investigating a possible upgrade to 1.0).
-- Ian
participants (8)
-
Charles Marcus
-
FiL @ Kpoxa
-
Frank Cusack
-
Frank Cusack
-
Ian Mortimer
-
John Peacock
-
Les Mikesell
-
Magnus Stenman