[Dovecot] Index corruption
I'm getting pretty constant index corruption on (admittedly large) Maildir folders.
I'm running Thunderbird 0.7.3 and Mutt as my IMAP clients (on FreeBSD).
Server is Dovecot 0.99.10.9
configured thusly:
./configure --disable-ipv6 CC=gcc-3.3 CXX=g++-3.3
--prefix=/opt/dovecot --without-pop3d
on a Solaris 2.8 box. Maildirs are NFS-mounted, but the index files are
on local UFS filesystem:
default_mail_env = maildir:~/Maildir:INDEX=/export/01/imapd/%u
MTA is Postfix, LDA is procmail delivering to several Maildir folders.
There will be 2 (or more) dovecot instances running for this user (me) - several from Thunderbird and one from the BIFF client (icewm). Turning off the BIFF client doesn't seem to help.
Symptoms: when using Thunderbird, and deleting a msg or moving it to another folder, Thunderbird will sometimes not display the next unread message, but just sit there. Changing folders works, but changing back to the previous folder (or quitting/restarting Thunderbird) only shows the first N messages (N is about 500, in a Maildir of about 6000, and it is the SAME email that is "last" every time.) This happens with multiple folders, but (usually) only folders that are delivered to by procmail (this may be related to the problem or a coincidence in that only those folders have unread email!)
Accessing the mailboxes using MUTT via IMAP shows the same truncated message list.
(I'm the only user using dovecot as I want Maildir, the other users have uw-imap, so dovecot is listening on port 4343)
Closing all imap clients, then removing the .index.* files from the appropriate /export/01/imapd/ directory, then restarting the imap client causes the missing messages to reappear (after a suitable wait for the index files to be recreated).
It really looks like locking or truncation issues in the index files....
Any clues? Debugging hints? This is happening several times per day so I can get lots o' debug data!
Greg.
On 18.10.2004, at 03:22, Gregory Bond wrote:
I'm getting pretty constant index corruption on (admittedly large) Maildir folders.
I'd suggest grabbing 1.0-test49 from http://dovecot.org/test/ and trying if it works better.
Indexing code (and pretty much everything else) has been rewritten in 1.0-test releases and it's quite stable nowadays. I've noticed some maildir problems with stress testing (which I'll try to get around to fixing sometimes soon) but with normal use it works fine.
Timo Sirainen wrote:
I'd suggest grabbing 1.0-test49 from http://dovecot.org/test/ and trying if it works better.
Wow, reply at 3am!
I can't get the imapd from this release to start up. No matter what I do, I'm getting this error:
auth(default): We couldn't drop root group privileges (wanted=1,
gid=0, egid=0)
even before it asks for username.
I've tried starting from inetd, starting from from dovecot, setting first_valid_gid = 0, 1 or 2
Looking at the logic in lib/restrict-access.c I'm not sure why or how this can be failing. gid=1 so setgid() must have succeeded at the top of restrict_access_by_env(): if (gid != 0 && (gid != getgid() || gid != getegid())) { if (setgid(gid) != 0)
and uid must be != 0, else this test fails: if (setuid(0) == 0) { if (uid == 0) i_fatal("Running as root isn't permitted"); i_fatal("We couldn't drop root privileges");
so how does this test: if (getgid() == 0 || getegid() == 0 || setgid(0) == 0) { ever get to be true?
Gregory Bond wrote:
I can't get the imapd from this release to start up. No matter what I do, I'm getting this error:
auth(default): We couldn't drop root group privileges (wanted=1, gid=0, egid=0)
even before it asks for username.
I ran into this problem with the later test versions on Solaris 9. Joshua Goodall gave me a two line patch (which i don't have here, I'm afraid) to revert this check which came in around test43 (from memory).
A check of the list archives finds his recommendation was to use src/lib/restrict-access.c revision 1.13 from CVS.
The issue is still outstanding, it seems.
Timo - I have a few Solaris 9 boxes, and will happily perform whatever tests will help.
-- Curtis.
Curtis Maloney wrote:
I ran into this problem with the later test versions on Solaris 9. Joshua Goodall gave me a two line patch (which i don't have here, I'm afraid) to revert this check which came in around test43 (from memory).
A check of the list archives finds his recommendation was to use src/lib/restrict-access.c revision 1.13 from CVS.
Ah good, it's not just me. I've had a bit more of a play and I understand it a bit better. Looks like restrict_access_by_env() is being called in 2 different contexts - once to establish the "dovecot" user , once as root (presumably in the auth daemon). The call as root fails because the program tries setgid() to prove it can't, but as root this works. The following patch (to test49 version of lib/restrict-access.c) works for me, but I'm not going to pretend I understand dovecot's auth framework well enough to know if this is harmless. (Beware cut-n-paste whitespace munching). --- src/lib/restrict-access.c.DIST 2004-09-24 23:04:31.000000000 +1000 +++ src/lib/restrict-access.c 2004-10-18 15:04:36.716002000 +1000 @@ -204,7 +204,7 @@ env = getenv("RESTRICT_GID_FIRST"); if (gid != 0 || (env != NULL && atoi(env) != 0)) { - if (getgid() == 0 || getegid() == 0 || setgid(0) == 0) { + if (getgid() == 0 || getegid() == 0 || (uid != 0 && setgid(0) == 0)) { if (gid == 0) i_fatal("GID 0 isn't permitted"); i_fatal("We couldn't drop root group privileges "
On 18.10.2004, at 08:32, Gregory Bond wrote:
I've had a bit more of a play and I understand it a bit better. Looks like restrict_access_by_env() is being called in 2 different contexts the auth daemon). The call as root fails because the program tries
- once to establish the "dovecot" user , once as root (presumably in
setgid() to prove it can't, but as root this works.
Ah, I see. I didn't realize that root's gid might not be 0.
{if (getgid() == 0 || getegid() == 0 || setgid(0) == 0)
setgid(0) == 0)) {if (getgid() == 0 || getegid() == 0 || (uid != 0 &&
Looks good, committing.
Timo Sirainen wrote:
On 18.10.2004, at 08:32, Gregory Bond wrote:
I've had a bit more of a play and I understand it a bit better. Looks like restrict_access_by_env() is being called in 2 different contexts the auth daemon). The call as root fails because the program tries
- once to establish the "dovecot" user , once as root (presumably in
setgid() to prove it can't, but as root this works.
Ah, I see. I didn't realize that root's gid might not be 0.
Hadn't thought about that at all... just checked, and sure enough: Default GID for root on Solaris is 1.
Guess it'll soon be time for me to rebuild testXX on my home IMAP server, then.
-- Curtis.
participants (3)
-
Curtis Maloney
-
Gregory Bond
-
Timo Sirainen