[Dovecot] how to best import Evolution/Thunderbird mail into dovecot?
Hi.
I'm migrating all my mail archive (some 60 GB) from Evolution (which is really a broken piece of software) into dovecot. Now I face the problem how to do this best...
Evolution (which is still a old 2.32.x version) itself uses mbox files, in a special hierarchical structure to allow subfolders and that like.
It also stores it's own status info in X-Evolution and X-Evolution-Source mail headers.
Unfortunately,... much of the mail was earlier from a Thunderbird installation, which uses it's own status headers (X-Mozilla*) that were not recognised by Evolution.
I have no idea which mbox subformat was always used throughout the different programs and versions...
- Any way to check for that?
To make things worse... Thunder(burden) seems to have used a modified From_ line syntax... "^From -<address> <date>$"
Ideally I'd like migrate all mail into dovecot (for storage reasons again, mbox) retaining the different status flags (read, forwarded, etc.) and getting rid of the proprietary headers (of course only when they were.
First thing I tried was to simply copy mail within Evolution (i.e. dragging&dropping it from the local folders to the IMAP folders from dovecot). headers
- that preserves the status from Evolution, but doesn't restore that from Thunderbird
- it clutters up the information of all From_ lines... "<address> <date>" becomes "<address of my default evolution account> <now>"
- neither does it handle the special Thunderbird From_ lines
- neither does it remove the Thunderbird or the X-Evolution-Source
- if Evolution has already had corrupted index files (and this is extremely likely... as it happens even immediately during recreating all of them)... so I may loose mail
So my idea was that I need a program that: proprietary headers and replace them by what dovecot uses as headers)
- can parse all the different mbox formats (those that use the quoted ">From" style and those that use Conent-Length)
- can differentiate message headers from body (so that I can drop the
- must of course understand multiline message headers
Any idea for a tool like that? The meaning of the X-Evolution and X-Mozilla* headers are easy to find on the web.... so I can convert them. So I basically "just" need a tool that parses all kinds of mbox formats... allow me to drop/add headers... and spits out the rest unmodified.
dovecot uses some special headers like X-UID and X-IMAPD... will it create these on it's own, the first time it processes the new mbox file? I mean these headers won't be there after creation.
Should I drop out (during) conversion... any other mail headers.. that dovecot uses as it's own?
Thanks, Chris.
Hi again...
Things are even much much worse... (oh how I hate Evolution right now).
I found a bug in Evolution, where it apparently corrupts all mail by incorrectly (not) quoting From_ lines in headers/bodies... It quotes lines matching "^From (.*)$" as ">From \1" but it does not quote at all already quoted From_ lines, i.e. "^>+From .*$".
Now that means AFAICS, that it's not possible to repair that corruption (you'll see my "happiness" about this, when reading the offensive bug report): Details here: https://bugzilla.gnome.org/show_bug.cgi?id=686258
I'm not sure how this affects any of my migration/conversion plans... any ideas?
Thanks, a desperate Chris.
On Wed, Oct 17, 2012 at 01:21:14AM +0200, Christoph Anton Mitterer wrote:
Hi.
[..]
First thing I tried was to simply copy mail within Evolution (i.e. dragging&dropping it from the local folders to the IMAP folders from dovecot).
This seems to be the smartest idea.
- that preserves the status from Evolution, but doesn't restore that from Thunderbird
Why not use TB to copy the emails from your 'TB mboxes' to Dovecot? This way I moved around 25 GiB of emails from >> 50 mbox files, created with TB 3.6 way down to some 0.x beta, to Dovecot -- without any issues I could recall.
[..]
- neither does it remove the Thunderbird or the X-Evolution-Source headers
If they bug you remove them with sed or awk or perl or python or ...
Dennis
[..]
Your best bet for a clean migration is to use an IMAP migration tool (assuming both of your servers support IMAP). It avoids all of the issues surrounding the underlying databases used to store the mailboxes and messages since everything is done through IMAP commands.
There are lots of different IMAP tools out there, some free some not. Using an IMAP migration tool is usually straight-forward and simple.
Here is a list of some of them. Full disclosure, imap_tools is mine.
imapsync: http://imapsync.lamiral.info
imap_tools: http://www.athensfbc.com/imap_tools
offlineimap: https://github.com/nicolas33/offlineimap
mbsync: http://isync.sourceforge.net/
mailsync: http://mailsync.sourceforge.net/
mailutil: http://www.washington.edu/imap/ part of the UW IMAP tookit.
imaprepl: http://www.bl0rg.net/software/ http://freecode.com/projects/imap-repl/
imapcopy: http://home.arcor.de/armin.diehl/imapcopy/imapcopy.html
migrationtool: http://sourceforge.net/projects/migrationtool/
imapmigrate: http://sourceforge.net/projects/cyrus-utils/
larch: https://github.com/rgrove/larch (derived from wonko_imapsync)
wonko_imapsync: http://wonko.com/article/554
pop2imap: http://www.linux-france.org/prj/pop2imap/
exchange-away: http://exchange-away.sourceforge.net/
To copy all of a user's mailboxes from one IMAP server to another using my imapcopy tool is as simple as executing the following command:
imapcopy.pl -S source/username/password -D destination/user/password
Regards, Rick
Hi Rick and Robert.
Thanks for the tools... I'll have a look over them. :)
On Wed, 2012-10-17 at 15:53 +0000, Rick Sanders wrote:
Your best bet for a clean migration is to use an IMAP migration tool (assuming both of your servers support IMAP). It avoids all of the issues surrounding the underlying databases used to store the mailboxes and messages since everything is done through IMAP commands. Well the problem is that a) the mboxes are already mixed up (with respect to different formats), which was basically my fault. b) Evolution is severely broken, amongst others for this https://bugzilla.gnome.org/show_bug.cgi?id=686258 reason.
So I cannot really trust that automatic migration will work.
imapsync: http://imapsync.lamiral.info imap_tools: http://www.athensfbc.com/imap_tools offlineimap: https://github.com/nicolas33/offlineimap mbsync: http://isync.sourceforge.net/ mailsync: http://mailsync.sourceforge.net/ mailutil: http://www.washington.edu/imap/ part of the UW IMAP tookit. imaprepl: http://www.bl0rg.net/software/ http://freecode.com/projects/imap-repl/ imapcopy: http://home.arcor.de/armin.diehl/imapcopy/imapcopy.html migrationtool: http://sourceforge.net/projects/migrationtool/ imapmigrate: http://sourceforge.net/projects/cyrus-utils/ larch: https://github.com/rgrove/larch (derived from wonko_imapsync) wonko_imapsync: http://wonko.com/article/554 pop2imap: http://www.linux-france.org/prj/pop2imap/ exchange-away: http://exchange-away.sourceforge.net/
For most of them, I unfortunately didn't found information on whether they support the different subformats of mbox... what about your MboxtoIMAP.pl ?
Right now I tent to create my own converter based on mb2md... just that I don't write out maildir but again mbox.
Timo, when you're reading this: I'm not sure though, on which headers I must/should stripe for dovecot? From http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Metadata I'd guess that I have to drop all X-IMAPbase, X-IMAP and X-UID. (Will dovcote recreate them, when it indexes the mbox file the first time?)
And I have to manually create/calculate, Status, X-Status, X-Keyword (based on what either Evolution or Thunderbird set) and also Content-Length... the "From_" lines in the mails need then to be _not_ quoted.
Thanks, Chris.
Hi again :)
In the meantime I made some checks[0] on how much storage one looses by using maildir (compared to mbox)... and decided that it's much but I can live with it.
This of course doesn't solve my problems that I have a possibly a mix of different mbox subformats, a mix of different mail status formats (Thunderbird and Evolution)... and some 17k mails that suffered from From_ line corruption (due to Evolution, getmail and postfix either incorrectly quoting them or even intentionally using mboxo)... so I'll still need some scripting in the end.
Which I'll base upon mb2md[1] respectively it's Dovecot-izsed version[2]. I diffed the two, and it seems the only differences are that the later handles the following in addition:
- keywords (via X-IMAP, X-IMAPbase and X-Keywords)
- UIDs, UIDVALITIDYs and UIDLASTs (via the X-IMAP, X-IMAPbase and X-UID mail headers of the mboxes
- ,S= and ,W= tags
(Guess that's it right?)
Now I have some questions: to 1) I never used keywords on mails myself so far,... so if any X-Keywords headers exist, these were sent from remote. So I guess I _really want_ to ignore them (and not let remote people set my local keywords), right?
to 2) I haven't had time yet to read into the IMAP4 RFC (though I'll need to do so soon),... but AFAIU the UIDs, UIDVALITIDYs and UIDLASTs are used for the server/clients to identify which message they talk about and avoid unnecessary reloading and to assure statuses are set on the right message, etc.
All mails that I migrate were only used locally by one client. So I guess I can fully ignore any UID/UIDVALITIDY/UIDLAST preservation, right?
So in principle I can use plain mb2md (without the dovecot mods)... and simply convert all my mboxes to maildir, put them in the dovecot mail (having the mails in the ../new dirs) location and start dovecot, right?
Now will dovecot itself assign fresh consecutive UIDs to all maildir files? Or will I get into troubles?
to 3) If dovecot can make use of these,.. I'm happy with having them set, but analogous to (2): If I use plain mb2md (without the dovecot mods)... and simply convert all my mboxes to maildir, put them in the dovecot mail (having the mails in the ../new dirs) location and start dovecot....
Can I make dovecot to calculate these fields by itself when it loads?
Thanks, Chris.
[0] http://dovecot.org/pipermail/dovecot/2012-October/069130.html [1] http://batleth.sapienti-sat.org/projects/mb2md/ [2] http://dovecot.org/tools/mb2md.pl
On 30.10.2012, at 2.42, Christoph Anton Mitterer wrote:
Which I'll base upon mb2md[1] respectively it's Dovecot-izsed version[2]. I diffed the two, and it seems the only differences are that the later handles the following in addition:
- keywords (via X-IMAP, X-IMAPbase and X-Keywords)
- UIDs, UIDVALITIDYs and UIDLASTs (via the X-IMAP, X-IMAPbase and X-UID mail headers of the mboxes
- ,S= and ,W= tags
(Guess that's it right?)
Now I have some questions: to 1) I never used keywords on mails myself so far,... so if any X-Keywords headers exist, these were sent from remote. So I guess I _really want_ to ignore them (and not let remote people set my local keywords), right?
Yes.
to 2) I haven't had time yet to read into the IMAP4 RFC (though I'll need to do so soon),... but AFAIU the UIDs, UIDVALITIDYs and UIDLASTs are used for the server/clients to identify which message they talk about and avoid unnecessary reloading and to assure statuses are set on the right message, etc.
All mails that I migrate were only used locally by one client. So I guess I can fully ignore any UID/UIDVALITIDY/UIDLAST preservation, right?
Yeah, they're not that important if you don't care about clients redownloading cached messages.
So in principle I can use plain mb2md (without the dovecot mods)... and simply convert all my mboxes to maildir, put them in the dovecot mail (having the mails in the ../new dirs) location and start dovecot, right?
Now will dovecot itself assign fresh consecutive UIDs to all maildir files? Or will I get into troubles?
Dovecot will generate new UIDs.
to 3) If dovecot can make use of these,.. I'm happy with having them set, but analogous to (2): If I use plain mb2md (without the dovecot mods)... and simply convert all my mboxes to maildir, put them in the dovecot mail (having the mails in the ../new dirs) location and start dovecot....
Can I make dovecot to calculate these fields by itself when it loads?
Dovecot doesn't add them to the filenames, but adds them to dovecot-uidlist and/or dovecot.index.cache. If you're using Maildir++ quota then this isn't good enough, but when using Dovecot LDA there's no reason to use Maildir++ quota anyway, so it doesn't matter.
First thing I tried was to simply copy mail within Evolution (i.e. dragging&dropping it from the local folders to the IMAP folders from dovecot). This seems to be the smartest idea. Well as I've mentioned... on looses the info in the From_ lines (that is
On Wed, 2012-10-17 at 16:51 +0200, Dennis Guhl wrote: the RCPT TO address and the date of arrival) because Evolution does not correctly migrated them (actually I'm not sure whether IMAP would allow that).
- that preserves the status from Evolution, but doesn't restore that from Thunderbird Why not use TB to copy the emails from your 'TB mboxes' to Dovecot? This way I moved around 25 GiB of emails from >> 50 mbox files, created with TB 3.6 way down to some 0.x beta, to Dovecot -- without any issues I could recall. Sorry... too late for that... cause back in the "old" days when I went away from TB I didn't notice that the used other mail headers for their statuses... so now everthing is already mixed together.
If they bug you remove them with sed or awk or perl or python or ... Yeah... but sed alone is not enough... cause such lines may also appear in the body... and I mustn't remove them... So in principle I'm looking for a smart parser of mbox which already gives me headers and body and I can modify either.
Cheers, Chris.
On Wed, Oct 17, 2012 at 07:57:38PM +0200, Christoph Anton Mitterer wrote:
On Wed, 2012-10-17 at 16:51 +0200, Dennis Guhl wrote:
First thing I tried was to simply copy mail within Evolution (i.e. dragging&dropping it from the local folders to the IMAP folders from dovecot).
This seems to be the smartest idea.
Well as I've mentioned... on looses the info in the From_ lines (that is the RCPT TO address and the date of arrival) because Evolution does not correctly migrated them (actually I'm not sure whether IMAP would allow that).
Perhaps you mean the "^From " mbox delimiter line. You do not need mbox delimiters in maildir files. Did you mention whether or not you're using maildir?
http://rob0.nodns4.us/ -- system administration and consulting Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:
On Wed, 2012-10-17 at 13:12 -0500, /dev/rob0 wrote:
Well as I've mentioned... on looses the info in the From_ lines (that is the RCPT TO address and the date of arrival) because Evolution does not correctly migrated them (actually I'm not sure whether IMAP would allow that). Perhaps you mean the "^From " mbox delimiter line. Yes I meant them (the _ should have denoted the space)
You do not need mbox delimiters in maildir files. I know..
Did you mention whether or not you're using maildir? The reason is mainly that I have gazillions of mail in a ~ 60 GB archive... even with an fs optimised for small files I'd loose far too much space per mail than I want to afford.
Also, AFAIK full text search becomes much solver in maildir (as you need to open/close endless files). On the longterm view I want to have a look into things like dbmail/archiveopteryx... for the giant local archive... and keep dovecot "only" as the internet mail server.
Ideally dovecot would have such an SQL backend...or incorporate that part from Archiveopteryx.
Cheers, Chris.
On Wed, Oct 17, 2012 at 08:21:47PM +0200, Christoph Anton Mitterer wrote:
On Wed, 2012-10-17 at 13:12 -0500, /dev/rob0 wrote:
Did you mention whether or not you're using maildir?
The reason is mainly that I have gazillions of mail in a ~ 60 GB archive... even with an fs optimised for small files I'd loose far too much space per mail than I want to afford.
Fine, maildir is not the perfect solution for everyone. But I'm confused about why Evolution/Thunderbird local folders to IMAP folders does not work. That should be the best approach.
If it does not work, you're going to have some perl/python/ruby scripting to do.
http://rob0.nodns4.us/ -- system administration and consulting Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:
Am 17.10.2012 20:21, schrieb Christoph Anton Mitterer:
On Wed, 2012-10-17 at 13:12 -0500, /dev/rob0 wrote:
Well as I've mentioned... on looses the info in the From_ lines (that is the RCPT TO address and the date of arrival) because Evolution does not correctly migrated them (actually I'm not sure whether IMAP would allow that). Perhaps you mean the "^From " mbox delimiter line. Yes I meant them (the _ should have denoted the space)
You do not need mbox delimiters in maildir files. I know..
Did you mention whether or not you're using maildir? The reason is mainly that I have gazillions of mail in a ~ 60 GB archive... even with an fs optimised for small files I'd loose far too much space per mail than I want to afford.
Also, AFAIK full text search becomes much solver in maildir (as you need to open/close endless files). On the longterm view I want to have a look into things like dbmail/archiveopteryx... for the giant local archive... and keep dovecot "only" as the internet mail server.
Ideally dovecot would have such an SQL backend...or incorporate that part from Archiveopteryx.
Cheers, Chris.
this may help too
http://www.stchman.com/export_evolution.html http://www.ubuntugeek.com/how-to-export-your-mails-from-evolution-to-thunder... http://ubuntuforums.org/showthread.php?t=1760469 http://ubuntuforums.org/showthread.php?t=1870445
http://jaisejames.wordpress.com/2012/03/15/to-activate-maildir-in-thunderbir... http://realtechtalk.com/ThunderbirdMBOX_to_IMAPMaildir_migration_done_easy_w...
-- Best Regards MfG Robert Schetterer
sys4 AG Franziskanerstraße 15 Telefon +49 89 3090 4664 81669 München Telefax +49 89 3090 4666
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer Aufsichtsratsvorsitzender: Joerg Heidrich
On Wed, Oct 17, 2012 at 07:57:38PM +0200, Christoph Anton Mitterer wrote:
On Wed, 2012-10-17 at 16:51 +0200, Dennis Guhl wrote:
[move through Evolution to IMAP]
Well as I've mentioned... on looses the info in the From_ lines (that is the RCPT TO address and the date of arrival) because Evolution does not
The date and time of arrival can be concluded from the last Received: header. The RCPT TO need to be converted to a X-Original-To: header.
[..]
If they bug you remove them with sed or awk or perl or python or ... Yeah... but sed alone is not enough... cause such lines may also appear in the body... and I mustn't remove them... So in principle I'm looking for a smart parser of mbox which already gives me headers and body and I can modify either.
I think, like Rob suggested, you are in need of some serious scripting.
Dennis
On Thu, 2012-10-18 at 14:34 +0200, Dennis Guhl wrote:
[move through Evolution to IMAP] Seriously... I can just suggest anyone to never trust this piece of crap ;) Don't know which daemons led me to using it...
I think, like Rob suggested, you are in need of some serious scripting. Yeah... guess that's what it will end up with.
Cheers, Chris.
participants (6)
-
/dev/rob0
-
Christoph Anton Mitterer
-
Dennis Guhl
-
Rick Sanders
-
Robert Schetterer
-
Timo Sirainen