[Dovecot] Speed up large maildirs
Hi,
I'm very fond of maildirs, they are simple, I like that.
But I've never really understood the significance of maildir/new/. The only use I can see is that (for example in the case of dovecot), you'd move a message from new to cur and then update the index-file, that's it.
So to make put it more direct, you'd know when and with what to update the index-file, based on where the file resides.
I've never been bothered by them, until I found that that is the last hurdle for fast access to a large (and receiving a lot of mail) Maildir.
I've been thinking long and hard about what to do to make them work even better.
My idea at this point is the following, use maildir/new as little as possible, just use it for what it was intented and spread the time of updates.
If someone would make a wrapper-program around the delivery-program (like qmail-local, vpopmaildeliver, procmail or local and virtual from postfix) and the delivery program would print to standard output the path+filename it saved a (or more) message(s) to, the wrapper would be able to send an update-message to a dovecot-daemon.
That mean, updates are small, and opening a large maildir would be sped up.
How does that sound ?
I hope this is a helpfull suggestion.
New things are always on the horizon.
Leen Besselink wrote:
That mean, updates are small, and opening a large maildir would be sped up.
How does that sound ?
What difference does this make to simply telling dovecot to open the maildir periodically? You could do that easily in python:
import imaplib conn = imaplib.IMAP4_open("/usr/sbin/dovecot --exec-mail imap") conn.select("targetfolder") conn.logout()
Obviously you'd have python start overhead, but you can just use expect instead, which probably starts faster. Make your wrapper store a list of changed folders somewhere, and iterate over that list every couple of minutes using above script.
Or am I misunderstanding something? Oh, the best would obviously be using the (planned to be upcoming, I believe) dovecot LDA (local delivery agent).
johannes
That mean, updates are small, and opening a large maildir would be sped up.
How does that sound ?
What difference does this make to simply telling dovecot to open the maildir periodically? You could do that easily in python:
import imaplib conn = imaplib.IMAP4_open("/usr/sbin/dovecot --exec-mail imap") conn.select("targetfolder") conn.logout()
Obviously you'd have python start overhead, but you can just use expect instead, which probably starts faster. Make your wrapper store a list of changed folders somewhere, and iterate over that list every couple of minutes using above script.
Or am I misunderstanding something?
I guess that means you wouldn't need a username/password, for that..., just a small program that opens/closes recenty changed folders, ok.
I wouldn't mind a little overhead.
I think this would work (I was just suggesting a daemon, because I know dovecot can also keep things mmap'd).
Oh, the best would obviously be using the (planned to be upcoming, I believe) dovecot LDA (local delivery agent).
I like that idea, but it wouldn't solve it, for the situation where you'd use procmail for delivery.
(unless ofcourse the LDA does filtering/sorting as well, which sounds like a lot of work)
Where a small patch to procmail which uses standard out for telling the LDA or wrapper what files it delivered to, would be pretty easy.
johannes
New things are always on the horizon.
On Tue, Feb 15, 2005 at 07:53:38PM +0100, Leen Besselink wrote:
Where a small patch to procmail which uses standard out for telling the LDA or wrapper what files it delivered to, would be pretty easy.
You can already do that with procmail: $LASTFOLDER contains the name of the file procmail delivered to (sometime it's relative to $MAILDIR, sometimes it's absolute - it seems to depend on where it's used), so you could use those two to tell the LDA which file to check. You might be able to use the TRAP variable to achieve this; however all output from the TRAP command is logged to $LOGFILE, I don't know if it goes to STDOUT as well. If you wanted to use a daemon you could adapt the Python code shown previously to read from a named pipe, and pop this near the start of your .procmailrc: TRAP='echo "$LASTFOLDER" > /path/to/pipe' or invoking it directly: TRAP='/path/to/reindex/program "$LASTFOLDER"'
If you only want this for one mailbox you could add the 'c' flag to the recipe and duplicate the recipe directly afterwards, making the action | echo "$MAILDIR/$LASTFOLDER" > /path/to/pipe or: | /path/to/reindex/program "$MAILDIR/$LASTFOLDER"'
You seem to need "$MAILDIR/$LASTFOLDER" here instead of just "$LASTFOLDER", though it may differ for you.
Hopefully this will be useful to you,
-- John Tobin "A method of styling hair to cover partial baldness using only the hair on a person's head. The hair styling requires dividing a person's hair into three sections and carefully folding one section over another." -- US Patent #4,022,227
You can already do that with procmail: $LASTFOLDER contains the name of the file procmail delivered to (sometime it's relative to $MAILDIR, sometimes it's absolute - it seems to depend on where it's used), so you could use those two to tell the LDA which file to check. You might be able to use the TRAP variable to achieve this; however all output from the TRAP command is logged to $LOGFILE, I don't know if it goes to STDOUT as well. If you wanted to use a daemon you could adapt the Python code shown previously to read from a named pipe, and pop this near the start of your .procmailrc: TRAP='echo "$LASTFOLDER" > /path/to/pipe' or invoking it directly: TRAP='/path/to/reindex/program "$LASTFOLDER"'
If you only want this for one mailbox you could add the 'c' flag to the recipe and duplicate the recipe directly afterwards, making the action | echo "$MAILDIR/$LASTFOLDER" > /path/to/pipe or: | /path/to/reindex/program "$MAILDIR/$LASTFOLDER"'
You seem to need "$MAILDIR/$LASTFOLDER" here instead of just "$LASTFOLDER", though it may differ for you.
Hopefully this will be useful to you,
Thank you and the others for all the suggestions, they were all very helpfull.
thanks again, Leen Besselink.
New things are always on the horizon.
Leen Besselink wrote:
If someone would make a wrapper-program around the delivery-program (like qmail-local, vpopmaildeliver, procmail or local and virtual from postfix) and the delivery program would print to standard output the path+filename it saved a (or more) message(s) to, the wrapper would be able to send an update-message to a dovecot-daemon.
I don't understand how this would be an improvement. Currently, when dovecot opens a "folder" the first thing it does is open folder/new and moves those files to folder/cur, presumably updating the index as it goes. That is as fast as it could ever get.
Perhaps a short description of how Maildirs work (as I understand it) is in order:
an external application creates a files in ./tmp (this is the one time-consuming operation and it is not guaranteed to be atomic);
the same application then renames that file into the ./new folder (this is atomic);
another application (say dovecot) opens the Maildir for reading and moves any files from ./new to ./cur (since this neatly corresponds to "new" files as opposed to previously seen files).
How would some additional information to the dovecot-daemon help in any way? The files in ./new are guaranteed to be, well, NEW, so they are the ones that dovecot needs to add to its own index, to avoid having to open the ./cur folder and enumerate all files every single time.
John
-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4501 Forbes Boulevard Suite H Lanham, MD 20706 301-459-3366 x.5010 fax 301-429-5748
John Peacock zei:
Leen Besselink wrote:
If someone would make a wrapper-program around the delivery-program (like qmail-local, vpopmaildeliver, procmail or local and virtual from postfix) and the delivery program would print to standard output the path+filename it saved a (or more) message(s) to, the wrapper would be able to send an update-message to a dovecot-daemon.
I don't understand how this would be an improvement. Currently, when dovecot opens a "folder" the first thing it does is open folder/new and moves those files to folder/cur, presumably updating the index as it goes. That is as fast as it could ever get.
But if you do it all at ones, it's gonna take time (the user needs to wait, till this is done)
Perhaps a short description of how Maildirs work (as I understand it) is in order:
an external application creates a files in ./tmp (this is the one time-consuming operation and it is not guaranteed to be atomic);
the same application then renames that file into the ./new folder (this is atomic);
another application (say dovecot) opens the Maildir for reading and moves any files from ./new to ./cur (since this neatly corresponds to "new" files as opposed to previously seen files).
How would some additional information to the dovecot-daemon help in any way? The files in ./new are guaranteed to be, well, NEW, so they are the ones that dovecot needs to add to its own index, to avoid having to open the ./cur folder and enumerate all files every single time.
The fun thing is, when you read a message, the filename in cur get's changed, so new only means, not moved and indexed (as I understand it).
John
-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4501 Forbes Boulevard Suite H Lanham, MD 20706 301-459-3366 x.5010 fax 301-429-5748
New things are always on the horizon.
On Tue, 2005-02-15 at 13:49 -0500, John Peacock wrote:
Leen Besselink wrote:
If someone would make a wrapper-program around the delivery-program (like qmail-local, vpopmaildeliver, procmail or local and virtual from postfix) and the delivery program would print to standard output the path+filename it saved a (or more) message(s) to, the wrapper would be able to send an update-message to a dovecot-daemon.
I don't understand how this would be an improvement. Currently, when dovecot opens a "folder" the first thing it does is open folder/new and moves those files to folder/cur, presumably updating the index as it goes. That is as fast as it could ever get.
Well, with 0.99.x it does more than what client needs, and it might have to read through all the messages twice. So 1.0-tests might half the time needed to open a mailbox, or even more.
The files in ./new are guaranteed to be, well, NEW, so they are the ones that dovecot needs to add to its own index, to avoid having to open the ./cur folder and enumerate all files every single time.
Actually Dovecot looks at cur too every time its timestamp has changed. So moving the mail from new/ to cur/ means Dovecot will scan through the cur/ within a couple of seconds. The scan should be pretty fast though.
The optimal solution anyway will be Dovecot's own mail format at some point, delivered through Dovecot LDA which indexes the mail immediately while storing it.
Timo Sirainen wrote:
Actually Dovecot looks at cur too every time its timestamp has changed. So moving the mail from new/ to cur/ means Dovecot will scan through the cur/ within a couple of seconds. The scan should be pretty fast though.
Well, yes, since Dovecot can't assume it's the only program accessing the mailbox. But my point is that Dovecot already knows that the files in ./new are not in the index, so there is no point in telling it about the messages (which is what the OP was suggesting).
The optimal solution anyway will be Dovecot's own mail format at some point, delivered through Dovecot LDA which indexes the mail immediately while storing it.
That would be a database, right? ;-)
John
-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4501 Forbes Boulevard Suite H Lanham, MD 20706 301-459-3366 x.5010 fax 301-429-5748
On 15.2.2005, at 21:13, John Peacock wrote:
Timo Sirainen wrote:
Actually Dovecot looks at cur too every time its timestamp has changed. So moving the mail from new/ to cur/ means Dovecot will scan through the cur/ within a couple of seconds. The scan should be pretty fast though.
Well, yes, since Dovecot can't assume it's the only program accessing the mailbox. But my point is that Dovecot already knows that the files in ./new are not in the index, so there is no point in telling it about the messages (which is what the OP was suggesting).
Not necessarily.. :) I don't remember how 0.99.x did it, but 1.0-tests don't move mail from new to cur if
a) there's not enough disk space/quota to do it b) the mailbox was opened read-only (or via SELECT or EXAMINE) so \Recent flag is kept
In both cases the message is indexed, assuming there's space to store indexes.
The optimal solution anyway will be Dovecot's own mail format at some point, delivered through Dovecot LDA which indexes the mail immediately while storing it.
That would be a database, right? ;-)
Basically the current indexes, with message bodies stored in one or more files depending on how configured, but without any duplicated metadata in them. Configuration would probably be "try to keep file sizes around 1MB" or "1 message per file".
On Tue, February 15, 2005 2:21 pm, Timo Sirainen said:
On 15.2.2005, at 21:13, John Peacock wrote:
Timo Sirainen wrote:
The optimal solution anyway will be Dovecot's own mail format at some point, delivered through Dovecot LDA which indexes the mail immediately while storing it.
That would be a database, right? ;-)
Basically the current indexes, with message bodies stored in one or more files depending on how configured, but without any duplicated metadata in them. Configuration would probably be "try to keep file sizes around 1MB" or "1 message per file".
I'm hoping that you're not intending to go the Cyrus route where only the proprietary format is supported; I don't want to give up the ability to use procmail/formail/other unix utilities on messages...
Jim Trigg
On 15.2.2005, at 21:42, Jim Trigg wrote:
Basically the current indexes, with message bodies stored in one or more files depending on how configured, but without any duplicated metadata in them. Configuration would probably be "try to keep file sizes around 1MB" or "1 message per file".
I'm hoping that you're not intending to go the Cyrus route where only the proprietary format is supported; I don't want to give up the ability to use procmail/formail/other unix utilities on messages...
Dovecot is built from the beginning to support multiple mailbox formats. That won't change and it's clearly one of its current major strengths.
participants (6)
-
Jim Trigg
-
Johannes Berg
-
John Peacock
-
John Tobin
-
Leen Besselink
-
Timo Sirainen