[Dovecot] More AFS maildir debugging
There appears to be some sort of a race condition when accessing a maildir directory for the first time with dovecot. I was getting a non-specific error trying to select INBOX, but when I stepped through it with a debugger it worked. Deleting the dovecot files and running the select again it failed. I started playing with it again just now, and when attempting to access Sent, I actually got some errors:
a04 select Sent
imap(aarons): Error: Corrupted transaction log file /XX/Maildir/.Sent/dovecot.index.log: unexpected end of file while reading header
imap(aarons): Error: Corrupted transaction log file /XX/Maildir/.Sent/dovecot.index.log: unexpected end of file while reading header
imap(aarons): Error: Corrupted transaction log file /XX/Maildir/.Sent/dovecot.index.log: unexpected end of file while reading header
a04 NO Internal error occurred. Refer to server log for more information. [2007-01-13 16:17:51]
I then deleted dovecot.index.log and dovecot.index.log.2 (the only two dovecot files that were present) and retried the select:
a04 select Sent
a04 NO Internal error occurred. Refer to server log for more information. [2007-01-13 16:23:40]
Does anyone more familiar with the code than myself have any pointers of where to concentrate my debugging efforts?
Thanks.
-Aaron
On Sat, 2007-01-13 at 16:26 -0500, Aaron Solochek wrote:
I then deleted dovecot.index.log and dovecot.index.log.2 (the only two dovecot files that were present) and retried the select:
a04 select Sent
a04 NO Internal error occurred. Refer to server log for more information. [2007-01-13 16:23:40]
Looking at the code, I think this could happen when Dovecot tries to create the log file, then it tries to open it but notices that again it's corrupted.
So there's something weird which causes newly created files to be broken. Does it work if you append :INDEX=MEMORY to mail_location? What about if you add :INDEX=/tmp/dovecot-%u or something so that the indexes aren't in AFS?
Timo Sirainen wrote:
On Sat, 2007-01-13 at 16:26 -0500, Aaron Solochek wrote:
I then deleted dovecot.index.log and dovecot.index.log.2 (the only two dovecot files that were present) and retried the select:
a04 select Sent
a04 NO Internal error occurred. Refer to server log for more information. [2007-01-13 16:23:40]
Looking at the code, I think this could happen when Dovecot tries to create the log file, then it tries to open it but notices that again it's corrupted.
So there's something weird which causes newly created files to be broken. Does it work if you append :INDEX=MEMORY to mail_location? What about if you add :INDEX=/tmp/dovecot-%u or something so that the indexes aren't in AFS?
Both modifications fix the index problem on rc18. However, a new problem I'm noticing now with thunderbird is that subscriptions are not persistent. It seems whenever thunderbird connects to dovecot, it gets an empty subscription list. Any advice on how to fix this?
-Aaron
On Mon, 2007-01-22 at 20:24 -0500, Aaron Solochek wrote:
So there's something weird which causes newly created files to be broken. Does it work if you append :INDEX=MEMORY to mail_location? What about if you add :INDEX=/tmp/dovecot-%u or something so that the indexes aren't in AFS?
Both modifications fix the index problem on rc18. However, a new problem I'm noticing now with thunderbird is that subscriptions are not persistent. It seems whenever thunderbird connects to dovecot, it gets an empty subscription list. Any advice on how to fix this?
Does the subscription file contain anything? Or does Thunderbird just break it by subscribing mailboxes?
Anyway, I can think of two possibilities: pread() is broken, or link() is broken. Try if dotlock_use_excl=yes helps? If not, try removing HAVE_PREAD from config.h and recompiling.
Timo Sirainen wrote:
On Mon, 2007-01-22 at 20:24 -0500, Aaron Solochek wrote:
So there's something weird which causes newly created files to be broken. Does it work if you append :INDEX=MEMORY to mail_location? What about if you add :INDEX=/tmp/dovecot-%u or something so that the indexes aren't in AFS?
Both modifications fix the index problem on rc18. However, a new problem I'm noticing now with thunderbird is that subscriptions are not persistent. It seems whenever thunderbird connects to dovecot, it gets an empty subscription list. Any advice on how to fix this?
Does the subscription file contain anything? Or does Thunderbird just break it by subscribing mailboxes?
Anyway, I can think of two possibilities: pread() is broken, or link() is broken. Try if dotlock_use_excl=yes helps? If not, try removing HAVE_PREAD from config.h and recompiling.
It looks like setting that config option solved the subscription problem. So, in summary, to put the maildir in AFS you need to keep the indexes on the local disk (or memory) disable mmap and any hardlinks, use dotlocks, and set the dotlock_use_excl=yes option.
Now I can start testing dovecot with a couple users so I can hopefully replace bincimap.
Thank you!
-Aaron
On Tue, 2007-01-23 at 06:56 -0500, Aaron Solochek wrote:
Anyway, I can think of two possibilities: pread() is broken, or link() is broken. Try if dotlock_use_excl=yes helps? If not, try removing HAVE_PREAD from config.h and recompiling.
It looks like setting that config option solved the subscription problem. So, in summary, to put the maildir in AFS you need to keep the indexes on the local disk (or memory) disable mmap and any hardlinks, use dotlocks, and set the dotlock_use_excl=yes option.
I think dotlock_use_excl=yes could have fixed index files as well.
Timo Sirainen wrote:
On Tue, 2007-01-23 at 06:56 -0500, Aaron Solochek wrote:
Anyway, I can think of two possibilities: pread() is broken, or link() is broken. Try if dotlock_use_excl=yes helps? If not, try removing HAVE_PREAD from config.h and recompiling.
It looks like setting that config option solved the subscription problem. So, in summary, to put the maildir in AFS you need to keep the indexes on the local disk (or memory) disable mmap and any hardlinks, use dotlocks, and set the dotlock_use_excl=yes option.
I think dotlock_use_excl=yes could have fixed index files as well. I actually think it did, because when I was cleaning up my debugging stuff I noticed that I was explicitly setting the MAIL environment variable in my mail_executable (which is just a script to get credentials before accessing AFS) and I was setting it just to the maildir without the extra options. Putting the index files on the local disk makes a lot of sense anyway for performance reasons.
Thanks.
-Aaron
Timo Sirainen wrote:
On Mon, 2007-01-22 at 20:24 -0500, Aaron Solochek wrote:
So there's something weird which causes newly created files to be broken. Does it work if you append :INDEX=MEMORY to mail_location? What about if you add :INDEX=/tmp/dovecot-%u or something so that the indexes aren't in AFS?
Both modifications fix the index problem on rc18. However, a new problem I'm noticing now with thunderbird is that subscriptions are not persistent. It seems whenever thunderbird connects to dovecot, it gets an empty subscription list. Any advice on how to fix this?
Does the subscription file contain anything? Or does Thunderbird just break it by subscribing mailboxes?
Anyway, I can think of two possibilities: pread() is broken, or link() is broken. Try if dotlock_use_excl=yes helps? If not, try removing HAVE_PREAD from config.h and recompiling.
AFAIK, hard links are not supported at all in AFS, only symlinks. When I installed Courier for our local site, I had to replace the 'link' calls with 'rename', and 'unlink' with 'remove' - all #ifdef'd in our local source tree.
We're looking to switch to Dovecot so I'm happy to see someone else trying this out.
Putting the INDEX files elsewhere makes sense too if you want to use AFS quotas - one issue we have with Courier is that a user over quota gets denied access because all of the uuid/index files in Courier live in the maildir.
I'm still hoping someone can explain the quota plugin API to me so I can submit an AFS quota plugin - I have the code to calculate the quota but I'm having a hard time understanding what the different quota functions should return.
--Craig
On Wed, 2007-01-24 at 08:32 -0500, Craig Huckabee wrote:
AFAIK, hard links are not supported at all in AFS, only symlinks.
I googled a bit and saw something about hard links being supported but only within a directory. Or maybe that was about symlinks.
When I installed Courier for our local site, I had to replace the 'link' calls with 'rename', and 'unlink' with 'remove' - all #ifdef'd in our local source tree.
I recently changed maildir link() calls to rename()s, mostly for OSX. There's still one link() call where dovecot.index.log is linked to dovecot.index.log.2 file.
Putting the INDEX files elsewhere makes sense too if you want to use AFS quotas - one issue we have with Courier is that a user over quota gets denied access because all of the uuid/index files in Courier live in the maildir.
Dovecot has the same problem with dovecot-uidlist file too. You can move that elsewhere too with :CONTROL=/somewhere.
I'm still hoping someone can explain the quota plugin API to me so I can submit an AFS quota plugin - I have the code to calculate the quota but I'm having a hard time understanding what the different quota functions should return.
Have you looked at http://dovecot.org/patches/quota-rewrite.diff? I think that could be a bit easier to understand.
"TS" == Timo Sirainen <tss@iki.fi> writes:
TS> I googled a bit and saw something about hard links being supported
TS> but only within a directory. Or maybe that was about symlinks.
Hard links work but only within the same directory. This is, maybe in a purely theoretical sense, not substantially worse than standard unix filesystems in that in _general_ one cannot expect a hard link into anything other than the current directory not to fail with EXDEV. Common conventions in filesystem layout (i.e. many more directories than filesystems) means that it works most of the time.
Symlinks in AFS behave just as you expect except that an '@sys' in the symlink is expanded to the client 'sysname'.
On Wed, 2007-01-24 at 14:47 +0000, pod wrote:
"TS" == Timo Sirainen <tss@iki.fi> writes:
TS> I googled a bit and saw something about hard links being supported TS> but only within a directory. Or maybe that was about symlinks.
Hard links work but only within the same directory.
Yes, that's why I found it a bit weird that they didn't really work with Dovecot. I guess it doesn't like some part in this:
- create tmp file
- link() to .lock file
- unlink tmp file
- write to .lock file
- rename() to destination file
Because the destination file was left empty.
"TS" == Timo Sirainen <tss@iki.fi> writes:
TS> - create tmp file
TS> - link() to .lock file
TS> - unlink tmp file
TS> - write to .lock file
TS> - rename() to destination file
I think the problem here is that the rename will return EXDEV in a cross-directory situation if the file in question is also being held open. I think what is at the root of this behaviour is that AFS will buffer writes on the client which get flushed on close.
At least this is the impression I get from running some nasty perl (appended):
pod@not-invented-here$ (cd /tmp; perl ~/tmp/t.PL;) pod@not-invented-here$ (cd ~/afs/tmp; perl ~/tmp/t.PL;) T1: Invalid cross-device link at /home/pod/tmp/t.PL line 34. pod@not-invented-here$
-------------------- barf now -------------------- #! /usr/bin/perl use strict; use warnings; use POSIX qw(link unlink rename);
sub open_link_write_rename { my $s=shift; my $d=shift;
open(F, ">$s") or die; link($s, $s.".lock") or die; unlink($s) or die; print(F scalar localtime, "\n") or die; rename($s.".lock", $d) or die; }
sub open_link_write_close_rename { my $s=shift; my $d=shift;
open(F, ">$s") or die; link($s, $s.".lock") or die; unlink($s) or die; print(F scalar localtime, "\n") or die; close(F) or die; rename($s.".lock", $d) or die; }
mkdir "a" and mkdir "b" or die;
eval { open_link_write_rename("a/foo", "b/foo"); }; warn "T1: $!" if $@; close(F) or warn "$!"; map {unlink $_} "a/foo", "a/foo.lock", "b/foo";
eval { open_link_write_close_rename("a/foo", "b/foo"); }; warn "T2: $!" if $@; map {unlink $_} "a/foo", "a/foo.lock", "b/foo";
eval { open_link_write_rename("a/foo", "a/bar"); }; warn "T3: $!" if $@; close(F) or warn "$!"; map {unlink $_} "a/foo", "a/foo.lock", "a/bar";
eval { open_link_write_close_rename("a/foo", "a/bar"); }; warn "T4: $!" if $@; map {unlink $_} "a/foo", "a/foo.lock", "a/bar";
rmdir "a" and rmdir "b" or die;
On Wed, 2007-01-24 at 18:53 +0000, pod wrote:
TS> - create tmp file TS> - link() to .lock file TS> - unlink tmp file TS> - write to .lock file TS> - rename() to destination file
I think the problem here is that the rename will return EXDEV in a cross-directory situation if the file in question is also being held open.
Except that Dovecot doesn't do cross-directory link()s or rename()s here. They're all in the same directory and they still apparently don't work.
If anything returned EXDEV, it would be logged as error and Dovecot would give internal failure. Now it's just silently creating empty files without any syscalls giving errors.
Craig Huckabee wrote:
Timo Sirainen wrote:
On Mon, 2007-01-22 at 20:24 -0500, Aaron Solochek wrote:
So there's something weird which causes newly created files to be broken. Does it work if you append :INDEX=MEMORY to mail_location? What about if you add :INDEX=/tmp/dovecot-%u or something so that the indexes aren't in AFS?
Both modifications fix the index problem on rc18. However, a new problem I'm noticing now with thunderbird is that subscriptions are not persistent. It seems whenever thunderbird connects to dovecot, it gets an empty subscription list. Any advice on how to fix this?
Does the subscription file contain anything? Or does Thunderbird just break it by subscribing mailboxes?
Anyway, I can think of two possibilities: pread() is broken, or link() is broken. Try if dotlock_use_excl=yes helps? If not, try removing HAVE_PREAD from config.h and recompiling.
AFAIK, hard links are not supported at all in AFS, only symlinks. When I installed Courier for our local site, I had to replace the 'link' calls with 'rename', and 'unlink' with 'remove' - all #ifdef'd in our local source tree.
We're looking to switch to Dovecot so I'm happy to see someone else trying this out.
Putting the INDEX files elsewhere makes sense too if you want to use AFS quotas - one issue we have with Courier is that a user over quota gets denied access because all of the uuid/index files in Courier live in the maildir.
I'm still hoping someone can explain the quota plugin API to me so I can submit an AFS quota plugin - I have the code to calculate the quota but I'm having a hard time understanding what the different quota functions should return.
I'm glad I'm not the only one. An AFS quota plugin would be very nice. My only suggestion there would be to not make any assumptions about the volume structure in the Maildir. I personally broke my Maildir up into multiple volumes so they wouldn't get too large (and a pain to deal with.) So, for instance, .Sent is actually it's own volume.
Anyway, be sure to send a message to the list if you write that, since I would certainly use it.
-Aaron
participants (4)
-
Aaron Solochek
-
Craig Huckabee
-
pod
-
Timo Sirainen