[Dovecot] Importing emails from PST Archives
Hello everyone,
I am currently migrating a small company from exchange 2003 to Dovecot.
So far, the most painful process has been the PST archives. (Except the inconditionals of Outlook)
I have been able to create the directory structure using the last version of readpst, with the -r flag. But once this done, I have not been able to find one tool that worked correctly. Maybe the configuration I have is complex.
The perl script to transform mbox files into maildirs in the dovecot distribution is old, and crashed many times in the middle of the process. I had a look to the script, and gave up trying to fix it.
I found a python script that was supposed to crawl this folder structure, and to replicate it using IMAP commands, but it crashed as well, and restarting the process would import twice the same messages. The script is here: http://costela.net/2011/06/importing-an-outlook-pst-into-imap/
I found another python script that was working better, and seemed to be well written, but with one mbox to one IMAP folder only. It can be found here: http://imap-upload.svn.sourceforge.net/viewvc/imap-upload/trunk/ I have modified, and I have added some minor fixes:
- Recursively traverse a folder structure, and replicate it using IMAP commands on the server.
- Properly manage folder names with special characters. (dovecot can manage these characters using the listescape plugin).
- Avoid taking all the resources of the server (A quirty hack that can change).
I am not an expert in Python, and the script has been quickly writter to fit my needs. However I think it can be modified easily to any configuration. In the future, maybe this script can use the libpst python bindings to import the emails directly. The last version of the script, modified is here: https://github.com/arodier/EmailTools/tree/master/Migration. Do not hesitate to help me, to make the script as generic as possible, particularly if you are a python expert.
I post this on this list, because I think you maybe interested if you are in the same case as me. The license is not specified, but I will probably use GPLv3.
Regards, André
At 10PM +0000 on 26/03/13 you (Andre Rodier) wrote:
The perl script to transform mbox files into maildirs in the dovecot distribution is old, and crashed many times in the middle of the process. I had a look to the script, and gave up trying to fix it.
I found a python script that was supposed to crawl this folder structure, and to replicate it using IMAP commands, but it crashed as well, and restarting the process would import twice the same messages. The script is here: http://costela.net/2011/06/importing-an-outlook-pst-into-imap/
I found another python script that was working better, and seemed to be well written, but with one mbox to one IMAP folder only. It can be found here: http://imap-upload.svn.sourceforge.net/viewvc/imap-upload/trunk/ I have modified, and I have added some minor fixes:
- Recursively traverse a folder structure, and replicate it using IMAP commands on the server.
- Properly manage folder names with special characters. (dovecot can manage these characters using the listescape plugin).
- Avoid taking all the resources of the server (A quirty hack that can change).
If I'm reading this right, it's reading a tree of mboxes? You should be able to convert this to any format Dovecot understands (maildir, dbox) with dsync, without having to go through IMAP. You would need to configure dsync to read the mboxes just as you would have configured Dovecot; for a sync from temporary mboxes you probably want to use INDEX=MEMORY to avoid having to mess about creating index files.
I am not an expert in Python, and the script has been quickly writter to fit my needs. However I think it can be modified easily to any configuration. In the future, maybe this script can use the libpst python bindings to import the emails directly. The last version of the script, modified is here: https://github.com/arodier/EmailTools/tree/master/Migration. Do not hesitate to help me, to make the script as generic as possible, particularly if you are a python expert.
Well, on my quick look, I don't much like this line:
ad = float(open("/proc/loadavg").readline().split(" ")[:3][0])
I would be surprised if Python didn't provide a portable way to get at that information... let's see (I don't really speak Python)... oh yes, os.getloadavg().
I post this on this list, because I think you maybe interested if you are in the same case as me. The license is not specified, but I will probably use GPLv3.
Without wishing to get into a licence war, there are a lot of people who object to the GPLv3, for good reasons. Do you have a good reason for changing it from the MIT licence used by the original?
Ben
On Wednesday, 27.03.13 at 03:38, Ben Morrow wrote:
At 10PM +0000 on 26/03/13 you (Andre Rodier) wrote:
The perl script to transform mbox files into maildirs in the dovecot distribution is old, and crashed many times in the middle of the process. I had a look to the script, and gave up trying to fix it.
I found a python script that was supposed to crawl this folder structure, and to replicate it using IMAP commands, but it crashed as well, and restarting the process would import twice the same messages. The script is here: http://costela.net/2011/06/importing-an-outlook-pst-into-imap/
I found another python script that was working better, and seemed to be well written, but with one mbox to one IMAP folder only. It can be found here: http://imap-upload.svn.sourceforge.net/viewvc/imap-upload/trunk/ I have modified, and I have added some minor fixes:
- Recursively traverse a folder structure, and replicate it using IMAP commands on the server.
- Properly manage folder names with special characters. (dovecot can manage these characters using the listescape plugin).
- Avoid taking all the resources of the server (A quirty hack that can change).
If I'm reading this right, it's reading a tree of mboxes? You should be able to convert this to any format Dovecot understands (maildir, dbox) with dsync, without having to go through IMAP. You would need to configure dsync to read the mboxes just as you would have configured Dovecot; for a sync from temporary mboxes you probably want to use INDEX=MEMORY to avoid having to mess about creating index files. I have not tried with dsync, but the script had the advantage to be quick to modify to my needs. One very bad thing with the PST archives is the modification of the sender information using the common name in active directory. I will modify the script to revert back to the original email address when I need. Also, since I am dealing with 10 years old emails, my users want to delete or deduplicate some of them, which have no legal value. A good example is distribution lists in AD.
I am not an expert in Python, and the script has been quickly writter to fit my needs. However I think it can be modified easily to any configuration. In the future, maybe this script can use the libpst python bindings to import the emails directly. The last version of the script, modified is here: https://github.com/arodier/EmailTools/tree/master/Migration. Do not hesitate to help me, to make the script as generic as possible, particularly if you are a python expert.
Well, on my quick look, I don't much like this line:
ad = float(open("/proc/loadavg").readline().split(" ")[:3][0])
I would be surprised if Python didn't provide a portable way to get at that information... let's see (I don't really speak Python)... oh yes, os.getloadavg(). Thanks for this, I will modify the script.
I post this on this list, because I think you maybe interested if you are in the same case as me. The license is not specified, but I will probably use GPLv3.
Without wishing to get into a licence war, there are a lot of people who object to the GPLv3, for good reasons. Do you have a good reason for changing it from the MIT licence used by the original?
I am not an expert in software licensing, but I am happy with this license so far.
Ben
Thanks for your suggestions
participants (2)
-
Andre Rodier
-
Ben Morrow