[Dovecot] Importing emails from PST Archives
Andre Rodier
andre.rodier at gmail.com
Wed Mar 27 08:56:01 EET 2013
On Wednesday, 27.03.13 at 03:38, Ben Morrow wrote:
> At 10PM +0000 on 26/03/13 you (Andre Rodier) wrote:
> >
> > The perl script to transform mbox files into maildirs in the dovecot
> > distribution is old, and crashed many times in the middle of the
> > process.
> > I had a look to the script, and gave up trying to fix it.
> >
> > I found a python script that was supposed to crawl this folder
> > structure, and to replicate it using IMAP commands, but it crashed as
> > well, and restarting the
> > process would import twice the same messages. The script is here:
> > http://costela.net/2011/06/importing-an-outlook-pst-into-imap/
> >
> > I found another python script that was working better, and seemed to
> > be well written, but with one mbox to one IMAP folder only.
> > It can be found here:
> > http://imap-upload.svn.sourceforge.net/viewvc/imap-upload/trunk/
> > I have modified, and I have added some minor fixes:
> > - Recursively traverse a folder structure, and replicate it using IMAP
> > commands on the server.
> > - Properly manage folder names with special characters. (dovecot can
> > manage these characters using the listescape plugin).
> > - Avoid taking all the resources of the server (A quirty hack that can
> > change).
>
> If I'm reading this right, it's reading a tree of mboxes? You should be
> able to convert this to any format Dovecot understands (maildir, dbox)
> with dsync, without having to go through IMAP. You would need to
> configure dsync to read the mboxes just as you would have configured
> Dovecot; for a sync from temporary mboxes you probably want to use
> INDEX=MEMORY to avoid having to mess about creating index files.
I have not tried with dsync, but the script had the advantage to be quick to modify
to my needs. One very bad thing with the PST archives is the modification of the sender
information using the common name in active directory. I will modify the script to
revert back to the original email address when I need.
Also, since I am dealing with 10 years old emails, my users want to delete or deduplicate
some of them, which have no legal value. A good example is distribution lists in AD.
>
> > I am not an expert in Python, and the script has been quickly writter
> > to fit my needs. However I think it can be modified easily to any
> > configuration.
> > In the future, maybe this script can use the libpst python bindings to
> > import the emails directly.
> > The last version of the script, modified is here:
> > https://github.com/arodier/EmailTools/tree/master/Migration. Do not
> > hesitate to help me, to make the script as
> > generic as possible, particularly if you are a python expert.
>
> Well, on my quick look, I don't much like this line:
>
> ad = float(open("/proc/loadavg").readline().split(" ")[:3][0])
>
> I would be surprised if Python didn't provide a portable way to get at
> that information... let's see (I don't really speak Python)... oh yes,
> os.getloadavg().
Thanks for this, I will modify the script.
>
> > I post this on this list, because I think you maybe interested if you
> > are in the same case as me. The license is not specified, but I will
> > probably use GPLv3.
>
> Without wishing to get into a licence war, there are a lot of people who
> object to the GPLv3, for good reasons. Do you have a good reason for
> changing it from the MIT licence used by the original?
>
I am not an expert in software licensing, but I am happy with this license so far.
> Ben
Thanks for your suggestions
More information about the dovecot
mailing list