[Dovecot] 1.0.rc10 status report
(Background: Relatively new to dovecot; looking to do transparent replacement of long-established UW-IMAP on cluster of Linux boxes which NFS-mount a shared "/var/spool/mail".)
With rc8, where I had already increased "login_max_processes_count" from default 128 to 1024, we had still hit the issue of too many logins crashing dovecot, so that trial had only lasted a couple of hours.
With rc10, I doubled this to try to avoid the problem (I didn't want to risk testing the new code that addressed the problem... sorry!). We ran for almost a full working day. Good! Because of a few issues (below) I then backed off.
"User unknown": We use NIS for our passwd information. On the earlier rc8 test we had had several occurences of "User unknown" (from "deliver") giving "dsn=5..." for perfectly valid users. So for this rc10 test I applied a local patch so these were reduced to "EX_TEMPFAIL" (dsn=4...). (This was triggered, as epected, a few times and subsequent delivery attemtps succeeded.) I strongly suspect that this is some sort of issue with FC5, probably "nscd" and nothing to do with dovecot. Hints would be nice, but from the dovecot perspective you may probably ignore this item.
For one particular user, the "deliver" consistently gave: Failed to create storage for '...' with mail 'mbox:/HOME_DIRECTORY_USED_BUT_NOT_GIVEN_BY_USERDB:INBOX=...
I think this is ultimately due to something strange in the user ".forward" file. I'd be delighted to follow this up with anyone else who might have seen it. Although in one sense we may be drifting off-topic, in another sense I suspect that there is scope for adjusting "deliver" to handle this more gracefully.
- There were several occurences of: IMAP(...): file ../../../../../src/lib-storage/index/mbox/mbox-sync-rewrite.c: line 405 (mbox_sync_read_and_move): assertion failed: (need_space == (uoff_t)-mails[idx].space) child 30842 (imap) killed with signal 6
This looks particularly awkward. Any thoughts?
- There were two occurences of: IMAP(...): file ../../../src/lib-index/mail-index.c: line 1801 (mail_index_move_to_memory): assertion failed: (index->fd == -1) child 20493 (imap) killed with signal 6
Again, this looks particularly awkward. Any thoughts?
For these last two items, note that the indexes are currently NFS-shared alongside the INBOX area.
I'm still not clear on how to regard the concept of indexes, as applied to a small cluster of machines, and handling simultaneous updates to INBOXes (analogous to the vital importance of INBOX locking for such updates).
If one imagines the IMAP daemon (and pop and deliver) as file-clients of the (NFS-shared) INBOXes on a fileserver, do the indexes belong very close to the INBOXes (fileserver) or the dovecot software (file client)? So should I have the indexes on the fileserver (one instance), or should they be on each cluster machine's private storage (possibly several instances; one per cluster machine)? I've got them on the server; would they be better on the cluster clients? (Might that be the cause and fix of these two problems?)
--
: David Lee I.T. Service : : Senior Systems Programmer Computer Centre : : Durham University : : http://www.dur.ac.uk/t.d.lee/ South Road : : Durham DH1 3LE : : Phone: +44 191 334 2752 U.K. :
On Oct 20, 2006, at 5:26 AM, David Lee wrote:
- There were several occurences of: IMAP(...): file ../../../../../src/lib-storage/index/mbox/ mbox-sync-rewrite.c: line 405 (mbox_sync_read_and_move): assertion
failed: (need_space == (uoff_t)-mails[idx].space) child 30842 (imap) killed with signal 6This looks particularly awkward. Any thoughts?
I'm also getting this error fairly often. I first installed Dovecot
with 1.0rc10 last Wednesday after running with UW-Imap for a few
years, under RedHat 9 Linux, using mbox format. The primary client
I'm using is Apple MacOS X Mail; the box has between 3-5 regular imap
users.
I added the following code right before the assert to try and assist
with debugging (assert line copied in for context), in the file src/
lib-storage/index/mbox/mbox-sync-rewrite.c at line 405:
if (need_space != (uoff_t)-mails[idx].space)
[idx].space: %d",need_space,idx,mails[idx].space); i_assert(need_space == (uoff_t)-mails[idx].space);i_info("Need_space: %d idx: %d mails
I know I'm not doing a proper cast of the variables, but here's the
log message I've gotten about 17 times since installing the debug
version about an hour ago:
Oct 23 01:24:58 dragonlair dovecot: IMAP(dalvenja): Need_space: -12
idx: 6 mails[idx].space: -1
(which is particularly odd, since the three numbers are always the
same.)
I've only halfheartedly tried to track back what's going on in the
daemon to get to this point;
I know that it seems to happen when I delete mails with the client
(which moves the mails to a Trash folder and then either sets the D
status flag or removes them from the Inbox entirely); beyond that I'm
not sure what's happening exactly. (If anyone does and can give me a
couple of hints as to what to try or look for, I can do some more
debugging).
It doesn't appear to be doing anything bad, other than slowing down
mail client operations a bit; OS X mail seems to be smart enough to
know when the imap server didn't do what it wants and to repeat the
operations, and I don't believe I've had any real issues as a result
of the assert failure.
Just as a data point, when I first converted, I did get the following
types of messages as well:
Oct 18 22:08:06 dragonlair dovecot: IMAP(dalvenja): mbox sync:
Expunged message reappeared in mailbox /var/spool/mail/dalvenja (UID
1 < 34486, seq=1, idx_msgs=0)
Oct 18 22:08:57 dragonlair dovecot: IMAP(dalvenja): mbox sync: UID
inserted in the middle of mailbox /var/spool/mail/dalvenja (34486 >
1, seq=1, idx_msgs=5607)
Oct 18 22:16:52 dragonlair dovecot: IMAP(dalvenja): UIDs broken with
partial sync in mbox file /var/spool/mail/dalvenja
but those don't appear to have shown up since (I assume Dovecot is
correcting them as it finds them.)
Other than that, Dovecot is certainly much faster than UW-IMAP; I'm
very pleased with the speedup I've gotten from it so far. It's also
lessened the incidents of OS X mail "giving up" on operations; with
UW-IMAP, if I tried to do too many operations at once (delete 5
messages, delete 5 more messages, pull up a message, delete 5 more,
pull up another), it would sometimes give up, put all the deleted
messages back, and try to resync itself with the imap server. So far
I haven't had any of those occurrences when using Dovecot.
Thanks all,
-dalvenjah
On Fri, 2006-10-20 at 13:26 +0100, David Lee wrote:
- "User unknown": We use NIS for our passwd information. On the earlier rc8 test we had had several occurences of "User unknown" (from "deliver") giving "dsn=5..." for perfectly valid users. So for this rc10 test I applied a local patch so these were reduced to "EX_TEMPFAIL" (dsn=4...). (This was triggered, as epected, a few times and subsequent delivery attemtps succeeded.) I strongly suspect that this is some sort of issue with FC5, probably "nscd" and nothing to do with dovecot. Hints would be nice, but from the dovecot perspective you may probably ignore this item.
Yea. Dovecot only does a getpwent() call which can't really be used wrong.
- For one particular user, the "deliver" consistently gave: Failed to create storage for '...' with mail 'mbox:/HOME_DIRECTORY_USED_BUT_NOT_GIVEN_BY_USERDB:INBOX=...
I think this is ultimately due to something strange in the user ".forward" file. I'd be delighted to follow this up with anyone else who might have seen it. Although in one sense we may be drifting off-topic, in another sense I suspect that there is scope for adjusting "deliver" to handle this more gracefully.
Is deliver executed from .forward file? In that case the HOME environment isn't set and deliver doesn't assume that it's going to deliver to the current local user, so it's not looking up the home directory by itself..
- There were several occurences of: IMAP(...): file ../../../../../src/lib-storage/index/mbox/mbox-sync-rewrite.c: line 405 (mbox_sync_read_and_move): assertion failed: (need_space == (uoff_t)-mails[idx].space) child 30842 (imap) killed with signal 6
This looks particularly awkward. Any thoughts?
In case you missed, this fixes it: http://dovecot.org/patches/1.0/dovecot-1.0.rc10-mbox-keywords-fix.patch
- There were two occurences of: IMAP(...): file ../../../src/lib-index/mail-index.c: line 1801 (mail_index_move_to_memory): assertion failed: (index->fd == -1) child 20493 (imap) killed with signal 6
Again, this looks particularly awkward. Any thoughts?
The moving to memory code isn't perfect, but normally it shouldn't even be done. I think there are only two reasons:
Filesystem quota / out of disk space in general
mbox_min_index_size
For these last two items, note that the indexes are currently NFS-shared alongside the INBOX area.
I'm still not clear on how to regard the concept of indexes, as applied to a small cluster of machines, and handling simultaneous updates to INBOXes (analogous to the vital importance of INBOX locking for such updates).
If one imagines the IMAP daemon (and pop and deliver) as file-clients of the (NFS-shared) INBOXes on a fileserver, do the indexes belong very close to the INBOXes (fileserver) or the dovecot software (file client)? So should I have the indexes on the fileserver (one instance), or should they be on each cluster machine's private storage (possibly several instances; one per cluster machine)? I've got them on the server; would they be better on the cluster clients? (Might that be the cause and fix of these two problems?)
Indexes contain metadata of the mailboxes, so if you're using multiple different computers to read/write to the same user's mailbox, then it's better to keep them in NFS.
If you can make only a single computer access the same user's mailbox most of the time then it's probably faster to keep them in local disk. Otherwise if you kept them in local disk in different computers you'd waste time in synchronizing the indexes separately for each computer that accesses the mailbox.
participants (3)
-
Dalvenjah FoxFire
-
David Lee
-
Timo Sirainen