[Dovecot] Possible to Customize File Naming Scheme?
Hello all,
[Sorry if this has been covered already - I searched back a little way in the archive and didn't find anything recent on the topic.]
I'm wondering if it is possible to customize the way dovecot creates or modifies filenames in the maildir directories?
I'm watching how my mail system works, and I see that procmail creates a
new file in the <folder>/new directory, each time an email is received.
This file is some complex combination of UIDs and things, suffixed by
the server name. So far, the filename has alphanumerics, a couple
underscores, and a dot or two only.
But once dovecot gets it's hands on the file and moves it to the
<folder>/cur directory, it starts doing "terrible" things to the file
name. Now, the filename starts to have "evil" things in it, like colons
and commas. Is there a way to change this? I'm asking this primarily
because I use dovecot as a massive long-term email archiving system.
One of the things one needs to be able to do when running a long-term
archive like this is keep things as simple and accessible as possible.
The reason I use maildir is that I totally buy into the "one email, one
file" idea - it means I don't have to store messages in big consolidated
database files that are changeable with each new version of the vendor's
software release (such as exchange DBs or Outlook PST files) or that are
horrible performers (such as mbox).
One of the nice things about the maildir "each email is a separate file" idea is that you are not limited to maildir or dovecot or any other piece of software to handle, read, and process the files. For instance, I would like to backup my maildir by using rsync to synchronize my dovecot-managed maildir to a Windows server running NFS. From there the files are synchronized via Windows DFS ( to which there is no open source solution that is even close) to several other servers around the continent. Only thing: The evil commas and colons in the filenames are anathema to Windows. So instead I tar the maildir folders to tgz files on the windows server, and the tgz's are synchronized to other DR sites.
If I could do without the need for tar (mandated solely because of the colons and commas in the dovecot filename scheme) I could minimize the time to backup (only synchronizing changes) and suddenly I would have a lot of other benefits opened up. One simple one would be that if I could configure dovecot to append the .eml extension to the end of every file (technically each file in a maildir is an eml file regardless of the extension being present or not - eml is just a raw mail file just like what you'd find in a maildir) I would have instant access to them using Search Server Express, which can read eml files but strongly prefers to use extentions to recognize files as such.
To be clear: I'm not requesting that dovecot's file naming convention be changed to match my quirky requirements - I'm just asking if it could be made configurable, so I could change it to match my needs and others could change it to match theirs. In the interests of REALLY being able to use the elegantly simple idea of each mail being a separate file, I'm trying to get more out of that great pile of folders and files I'm amassing in my mail archive server. The more use I can make of them with other software other than dovecot (i.e. data crawling, indexing, easy recovery in a catastrophe, etc) the more valuable this format is.
Is this possible?
On Seg, 12 Set 2011, Dave Stubbs wrote:
Hello all,
I'm watching how my mail system works, and I see that procmail
creates a new file in the <folder>/new directory, each time an email
is received. This file is some complex combination of UIDs and
things, suffixed by the server name. So far, the filename has
alphanumerics, a couple underscores, and a dot or two only.But once dovecot gets it's hands on the file and moves it to the
<folder>/cur directory, it starts doing "terrible" things to the
file name. Now, the filename starts to have "evil" things in it,
like colons and commas.
The colon and commas are part of the Maildir spec[0], so no, it can't
be changed.
[0]http://cr.yp.to/proto/maildir.html
-- If you want to read about love and marriage you've got to buy two separate books. -- Alan King
Eduardo M KALINOWSKI eduardo@kalinowski.com.br
On 12.9.2011, at 19.10, Dave Stubbs wrote:
I'm watching how my mail system works, and I see that procmail creates a new file in the <folder>/new directory, each time an email is received. This file is some complex combination of UIDs and things, suffixed by the server name. So far, the filename has alphanumerics, a couple underscores, and a dot or two only.
But once dovecot gets it's hands on the file and moves it to the <folder>/cur directory, it starts doing "terrible" things to the file name. Now, the filename starts to have "evil" things in it, like colons and commas. Is there a way to change this?
That's how Maildir works to store message flags. If you don't like it, use something else.
I'm asking this primarily because I use dovecot as a massive long-term email archiving system. One of the things one needs to be able to do when running a long-term archive like this is keep things as simple and accessible as possible. The reason I use maildir is that I totally buy into the "one email, one file" idea - it means I don't have to store messages in big consolidated database files that are changeable with each new version of the vendor's software release (such as exchange DBs or Outlook PST files) or that are horrible performers (such as mbox).
Dovecot v2.0's sdbox format could work for you.
One of the nice things about the maildir "each email is a separate file" idea is that you are not limited to maildir or dovecot or any other piece of software to handle, read, and process the files.
Well, sdbox isn't good for that then anymore. Cydir backend could possibly work, although it is missing some features that dbox has and was mainly intended as an example code for super simple mailbox format.
For instance, I would like to backup my maildir by using rsync to synchronize my dovecot-managed maildir to a Windows server running NFS. From there the files are synchronized via Windows DFS ( to which there is no open source solution that is even close) to several other servers around the continent. Only thing: The evil commas and colons in the filenames are anathema to Windows. So instead I tar the maildir folders to tgz files on the windows server, and the tgz's are synchronized to other DR sites.
You could patch Dovecot's maildir code to use something else than commas and colons in maildir-storage.h:
#define MAILDIR_INFO_SEP ':' #define MAILDIR_EXTRA_SEP ',' #define MAILDIR_FLAGS_SEP ','
#define MAILDIR_INFO_SEP_S ":" #define MAILDIR_EXTRA_SEP_S "," #define MAILDIR_FLAGS_SEP_S ","
If I could do without the need for tar (mandated solely because of the colons and commas in the dovecot filename scheme) I could minimize the time to backup (only synchronizing changes) and suddenly I would have a lot of other benefits opened up. One simple one would be that if I could configure dovecot to append the .eml extension to the end of every file (technically each file in a maildir is an eml file regardless of the extension being present or not - eml is just a raw mail file just like what you'd find in a maildir) I would have instant access to them using Search Server Express, which can read eml files but strongly prefers to use extentions to recognize files as such.
The message flags should still be stored somewhere if not in the filename. dbox and cydir stores them in Dovecot's index files.
To be clear: I'm not requesting that dovecot's file naming convention be changed to match my quirky requirements - I'm just asking if it could be made configurable, so I could change it to match my needs and others could change it to match theirs. In the interests of REALLY being able to use the elegantly simple idea of each mail being a separate file, I'm trying to get more out of that great pile of folders and files I'm amassing in my mail archive server. The more use I can make of them with other software other than dovecot (i.e. data crawling, indexing, easy recovery in a catastrophe, etc) the more valuable this format is.
Is this possible?
One last possibility is to create your own mailbox format that works exactly like you want.
On 12.9.2011, at 19.10, Dave Stubbs wrote:
I'm watching how my mail system works, and I see that procmail creates a new file in the<folder>/new directory, each time an email is received. This file is some complex combination of UIDs and things, suffixed by the server name. So far, the filename has alphanumerics, a couple underscores, and a dot or two only.
But once dovecot gets it's hands on the file and moves it to the<folder>/cur directory, it starts doing "terrible" things to the file name. Now, the filename starts to have "evil" things in it, like colons and commas. Is there a way to change this? That's how Maildir works to store message flags. If you don't like it, use something else. Fair enough
I'm asking this primarily because I use dovecot as a massive long-term email archiving system. One of the things one needs to be able to do when running a long-term archive like this is keep things as simple and accessible as possible. The reason I use maildir is that I totally buy into the "one email, one file" idea - it means I don't have to store messages in big consolidated database files that are changeable with each new version of the vendor's software release (such as exchange DBs or Outlook PST files) or that are horrible performers (such as mbox). Dovecot v2.0's sdbox format could work for you.
One of the nice things about the maildir "each email is a separate file" idea is that you are not limited to maildir or dovecot or any other piece of software to handle, read, and process the files. Well, sdbox isn't good for that then anymore. Cydir backend could possibly work, although it is missing some features that dbox has and was mainly intended as an example code for super simple mailbox format. Well, maybe sdbox could still work. Just a quick question - what is the
On 9/12/2011 12:22 PM, Timo Sirainen wrote: format of the u.* file? Is it still a raw (possibly partially) mime-encoded file that contains the all-important From: line, just like a mail file in a maildir folder? If so, I could sync the sdbox files elsewhere and index them if I could convince dovecot to use the filename scheme u.*.eml instead of u.*
Possible? Or is the sdbox file format different?
On 12.9.2011, at 21.00, Dave Stubbs wrote:
One of the nice things about the maildir "each email is a separate file" idea is that you are not limited to maildir or dovecot or any other piece of software to handle, read, and process the files. Well, sdbox isn't good for that then anymore. Cydir backend could possibly work, although it is missing some features that dbox has and was mainly intended as an example code for super simple mailbox format. Well, maybe sdbox could still work. Just a quick question - what is the format of the u.* file? Is it still a raw (possibly partially) mime-encoded file that contains the all-important From: line, just like a mail file in a maildir folder? If so, I could sync the sdbox files elsewhere and index them if I could convince dovecot to use the filename scheme u.*.eml instead of u.*
sdbox begins with a small dbox header, followed by the message text and finally a dbox metadata footer. Something like:
2 M1e C4e327f7d ^A^BN 0000000000000906 <message text here> ^A^C R4e327f7d V94e G39670b147d7f324e0e1d000074ccac23
dbox-file.h describes the headers and lists the metadata characters and what they mean. Because of this extra metadata I don't really know if it would be a good idea to name them *.eml.
Yes, you could copy specific sdbox files elsewhere and run "doveadm force-resync" on them. All message flags would be lost though, since they're stored only in Dovecot's index files.
participants (3)
-
Dave Stubbs
-
Eduardo M KALINOWSKI
-
Timo Sirainen