How to move/reorganise existing e-mails to yearly subfolders
Hi all:
I am new to e-mail servers and I am evaluating Dovecot. Not really the best combination. 8-)
I am trying to find a balance between legal data retention requirements and online mailbox size. I do not want huge online mailboxes, as doing offline, rotating data backups could then take forever (among other reasons). I would rather avoid online (cloud) backups (data protection etc.).
If I set a mailbox size limit, users will have to delete old mails by themselves. Or I could somehow script the deletion of attachments from old e-mails, as attachments are usually the main cause of huge mailboxes. Incidentally, can anyone point me to an easy way to achieve this? Preferably over with IMAP, otherwise with Dovecot tools.
With regards to legal data retention (which I am no expert about either), I thought I could use some Postfix BCC feature I heard about in order to copy all incoming and outgoing e-mails to a single "data retention" mailbox. Or maybe several of them. I could then archive e-mails from that mailbox on a yearly basis.
I would like to automatically organise e-mails inside that mailbox into subfolders like this:
2019/alice 2019/bob 2020/alice 2020/bob
That is: [year]/[username]
With such a folder structure, it is easier to see what is going on.
Is there a tool that can reorganise existing e-mails into such a folder structure?
I found some tools on the Internet to backup and export mails from IMAP to IMAP or maildir destinations. But I could not find a tool that just reorganises (moves) e-mails in such a manner inside an existing mailbox, maybe with a user-defined pattern for the destination folders.
I guess moving e-mails around on the same mailbox would be much faster than exporting and reimporting them in some clever way.
I could always write a Perl script, but that takes time. Such a tool may already exist. Or perhaps somebody could mention a similar, good-written script I can use as a starting point. I am sure there are many small gotchas to avoid. At the moment, I am only confident with Perl and Java. Maybe JavaScript.
It would be best to reorganise the e-mails over IMAP. This way, I am independent of the e-mail server. But a Dovecot-specific solution would also be helpful.
I could use such a reorganisation tool not just for archiving or data retention purposes, but to reorganise other mailboxes too, like my personal mailbox.
I would rather have a script. Clicking around in Thunderbird does not scale.
I have seen that you can use a "sieve" in Dovecot to achieve this. But I guess that would only apply to new e-mails. And that would probably apply just to incoming mails, and not to outgoing ones. If I migrate from the existing e-mail servers, the e-mails will probably be mass-imported without going through any "sieve", right?
Besides, I already have a few huge mailboxes with many years of e-mails, stored in a different e-mail server, and it would be nice to be able to reorganise them as they are. That way, I could archive the existing older e-mails before migrating to Dovecot, which would reduce the disk size and the export/import time for the migration.
Many thanks in advance, rdiez
What is the problem with having huge online mailboxes? Just choose a good european provider that has encryption all the way through to their storage platform.
I had exactly the same idea about migrating. You have to think twice about moving emails around of users. They do not like it ;) I have created an 'archive' environment on a distributed filesystem, and it takes me quite a lot of persuading to have people (or allow me) to move messages from common Sent and Inbox mailboxes to the Archive namespace (was not able to use the alternative storage option that dbox allows). Side note is that if you do archive these emails, most users do not even notice you have done this.
I ended up creating a script and a webmail plugin for users to enable autoarchiving, which creates something like this.
[@~]# mailbox-ls.sh testtest size listing mailboxes of testtest: Archive messages=0 Archive/2011 messages=0 Archive/2012 messages=0 Archive/2013 messages=0 Archive/2014 messages=0 Archive/2015 messages=0 Archive/2016 messages=0 Archive/2017 messages=0 Archive/2018 messages=0 Archive/2019 messages=3500 Archive/Archive messages=1 Deleted Messages messages=16 Drafts messages=2 INBOX messages=1286 INBOX/test2 messages=11 Junk messages=2 Sent messages=0 Trash messages=132
A cron job checks then if the script has run for the user this year, if not it starts archiving in the down hours otherwise it runs again in next year's 2nd quarter. What ever you choose, move messages with "doveadm move". I would not trust anyone else's programming with my users email, you should also not. Read the man pages on tools that work via imap, if they change headers, users are going to download all their messages again. I was thinking of splitting up folders like eg inbox/sales to Archive/2016/sales, Archive/2017/sales. But when I migrate to mdbox this is not necessary anymore.
I do not like the sound of "Postfix BCC feature", I use sendmail and I can duplicate messages with that, without altering anything in them. You do not want anything that changes your data. If your provider uses this mdbox format (maybe others support this also) then messages a user deletes, are not even removed from the server until "doveadm purge" is given ;)
-----Original Message----- From: R. Diez [mailto:rdiezmail-2006@yahoo.de] Sent: Monday, October 19, 2020 3:49 PM To: dovecot@dovecot.org Subject: How to move/reorganise existing e-mails to yearly subfolders
Hi all:
I am new to e-mail servers and I am evaluating Dovecot. Not really the best combination. 8-)
I am trying to find a balance between legal data retention requirements and online mailbox size. I do not want huge online mailboxes, as doing offline, rotating data backups could then take forever (among other reasons). I would rather avoid online (cloud) backups (data protection etc.).
If I set a mailbox size limit, users will have to delete old mails by themselves. Or I could somehow script the deletion of attachments from old e-mails, as attachments are usually the main cause of huge mailboxes. Incidentally, can anyone point me to an easy way to achieve this? Preferably over with IMAP, otherwise with Dovecot tools.
With regards to legal data retention (which I am no expert about either), I thought I could use some Postfix BCC feature I heard about in order to copy all incoming and outgoing e-mails to a single "data retention" mailbox. Or maybe several of them. I could then archive e-mails from that mailbox on a yearly basis.
I would like to automatically organise e-mails inside that mailbox into subfolders like this:
2019/alice 2019/bob 2020/alice 2020/bob
That is: [year]/[username]
With such a folder structure, it is easier to see what is going on.
Is there a tool that can reorganise existing e-mails into such a folder structure?
I found some tools on the Internet to backup and export mails from IMAP to IMAP or maildir destinations. But I could not find a tool that just reorganises (moves) e-mails in such a manner inside an existing mailbox, maybe with a user-defined pattern for the destination folders.
I guess moving e-mails around on the same mailbox would be much faster than exporting and reimporting them in some clever way.
I could always write a Perl script, but that takes time. Such a tool may already exist. Or perhaps somebody could mention a similar, good-written script I can use as a starting point. I am sure there are many small gotchas to avoid. At the moment, I am only confident with Perl and Java. Maybe JavaScript.
It would be best to reorganise the e-mails over IMAP. This way, I am independent of the e-mail server. But a Dovecot-specific solution would also be helpful.
I could use such a reorganisation tool not just for archiving or data retention purposes, but to reorganise other mailboxes too, like my personal mailbox.
I would rather have a script. Clicking around in Thunderbird does not scale.
I have seen that you can use a "sieve" in Dovecot to achieve this. But I guess that would only apply to new e-mails. And that would probably apply just to incoming mails, and not to outgoing ones. If I migrate from the existing e-mail servers, the e-mails will probably be mass-imported without going through any "sieve", right?
Besides, I already have a few huge mailboxes with many years of e-mails, stored in a different e-mail server, and it would be nice to be able to reorganise them as they are. That way, I could archive the existing older e-mails before migrating to Dovecot, which would reduce the disk size and the export/import time for the migration.
Many thanks in advance, rdiez
First of all, thanks for your answer.
What is the problem with having huge online mailboxes? Just choose a > good european provider that has encryption all the way through to their> storage platform.
We already have a European ISP with a standard e-mail server. I wanted to keep our own mail server on premises, so that it is not exposed on the Internet. The current server fetches (and removes) all e-mails from the ISP. That has many advantages, for example, internal e-mail still works in case of an Internet outage.
If I wanted to change the setup, I would have to start evaluating such an "encryption all the way through to their storage platform" feature. I am not sure that it is worth the effort. In any case, that sounds like a limiting factor when choosing another ISP, in case the current one starts making trouble.
I had exactly the same idea about migrating. You have to think twice > about moving emails around of users. They do not like it ;) [...]
I don't really want to do that, that's why I wrote "If I set a mailbox size limit, users will have to delete old mails by themselves".
I do not know much about the legal aspects, but in case we need to keep all e-mails for legal data retention requirements, I would like to store those e-mails separately, so that if a user deletes it, the original e-mail is still archived somewhere else.
That is why I mentioned the Postfix's BCC feature. The idea is that you have a separate mailbox where a separate copy of all e-mails to and from all users land. That is the separate mailbox where I wanted to reorganise e-mails by date, in order to archive the e-mails in smaller chunks on a yearly basis. Those e-mails do not need to be online after all. Chances are, they will never be needed anyway.
I have created an 'archive' environment on a distributed filesystem, and it takes me quite a lot of persuading to have people (or allow me) to move messages from common Sent and Inbox mailboxes to the Archive namespace [...]
I am actually a newbie in mail service matters, but my guess is that there is no amount of persuasion that could possibly help. You have to set a hard limit per mailbox and let the users deal with it, don't you? Otherwise, sooner or later the server will overload. Or I would need to become a full-time e-mail server admin, which is not an option either!
I am actually a friend of having 2 backup disks that rotate, where one is always physically off premises, and offline. But I wonder how I could keep the backups encrypted and synchronised with 2 rotating disks. Maybe Veracrypt + rsync.
I am hoping that the amount of big attachments in all incoming and outgoing mail still fits in normal external USB 3.0 disks. Or at least a few years' worth of it per disk. But I still would not want to have say 1 TB of mail data online. That would make the VM unmanageable for part-time sysadmins like me.
[@~]# mailbox-ls.sh testtest size [...] I would not trust anyone else's programming with my users email, you should also not.
I am not sure that I would trust my own e-mail server programming abilities either. 8-)
If you have written such scripts, perhaps you could point me to some example scripts that I could use as a starting point for such e-mail reorganisation tasks?
[...] But when I migrate to mdbox this is not necessary anymore.
I am not sure that I would trust a file format where the indexes cannot be rebuilt if they become corrupt. If I need an advanced format for search performance reasons, I would probably consider an SQL-based backend then.
I do not like the sound of "Postfix BCC feature", I use sendmail and I can duplicate messages with that, without altering anything in them. [...]
I am actually not sure yet how to achieve the copying. I am still a little confused anyway. 8-)
On the incoming side, I may not use Postfix at all, because Dovecot actually needs to download the e-mails from the ISP mail server. I am hoping that I can use a single "catch all" mailbox on the ISP. So I would need to copy the incoming e-mails in another way.
On the outgoing side, anything sent (actually per SMTP relay) through our internal mail server could be copied somehow with some BCC feature. But if the user connects to the external ISP's SMTP servers directly, then I cannot get a copy so easily. Maybe I need to force the users to always use the internal mail server for sending.
In any case, let's say that the duplicate mails, stored somewhere else for data retention purposes, get altered in some way, like some header is added or changed in a predictable way. I am thinking of a header like "BCC: dataretention@example.com". Why would that be a problem?
Best regards, rdiez
First of all, thanks for your answer.
What is the problem with having huge online mailboxes? Just choose a good european provider that has encryption all the way through to
their> storage platform.
We already have a European ISP with a standard e-mail server. I wanted
to keep our own mail server on premises, so that it is not exposed on
the Internet. The current server fetches (and removes) all e-mails from the ISP. That has many advantages, for example, internal e-mail still works in case of an Internet outage.
Use batched smtp, you will never miss an email, and just keep your on premises solution. You can do a lot with smtp configurations. You can even have email delivered on a 2nd location.
If I wanted to change the setup, I would have to start evaluating such an "encryption all the way through to their storage platform" feature. I am not sure that it is worth the effort. In any case, that sounds like a limiting factor when choosing another ISP, in case the current one starts making trouble.
I had exactly the same idea about migrating. You have to think twice about moving emails around of users. They do not like it ;) [...]
I don't really want to do that, that's why I wrote "If I set a mailbox size limit, users will have to delete old mails by themselves".
I do not know much about the legal aspects, but in case we need to keep all e-mails for legal data retention requirements, I would like to store those e-mails separately, so that if a user deletes it, the original e-mail is still archived somewhere else.
That is why I mentioned the Postfix's BCC feature. The idea is that you have a separate mailbox where a separate copy of all e-mails to and from all users land. That is the separate mailbox where I wanted to reorganise e-mails by date, in order to archive the e-mails in smaller chunks on a yearly basis. Those e-mails do not need to be online after all. Chances are, they will never be needed anyway.
That should be simple to realize, just to folders to archive! You do not need BCC to have copies delivered to 2nd account. You also have to think about outgoing mail. Duplicate those as well. And force spf, so users cannot send message via any other outgoing mailservers.
I have created an 'archive' environment on a distributed filesystem,
takes me quite a lot of persuading to have people (or allow me) to move messages from common Sent and Inbox mailboxes to the Archive namespace [...]
I am actually a newbie in mail service matters, but my guess is that
and it there is no amount of persuasion that could possibly help. You have to
set a hard limit per mailbox and let the users deal with it, don't you? Otherwise, sooner or later the server will overload. Or I would need to become a full-time e-mail server admin, which is not an option either!
You have to explain to people the advantages, eg when adding a phone, it does not download sync a huge inbox or Sent folder. We are not having any limit's. With current day solutions, I would say there is no need to. You can also outsource the work on your on premisses vm ;)
I am actually a friend of having 2 backup disks that rotate, where one is always physically off premises, and offline. But I wonder how I could keep the backups encrypted and synchronised with 2 rotating disks. Maybe Veracrypt + rsync.
Sounds sufficient, luks encryption is also fine.
I am hoping that the amount of big attachments in all incoming and outgoing mail still fits in normal external USB 3.0 disks. Or at least a
few years' worth of it per disk. But I still would not want to have say 1 TB of mail data online. That would make the VM unmanageable for part-time sysadmins like me.
At this point I do not see a need why you need to have any data online. The online servers just need to be properly configured for your on-premisses servers.
[@~]# mailbox-ls.sh testtest size [...] I would not trust anyone else's programming with my users email, you should also not.
I am not sure that I would trust my own e-mail server programming
abilities either. 8-)
If you have written such scripts, perhaps you could point me to some
example scripts that I could use as a starting point for such e-mail
reorganisation tasks?
[...] But when I migrate to mdbox this is not necessary anymore.
I am not sure that I would trust a file format where the indexes cannot be rebuilt if they become corrupt. If I need an advanced format for search performance reasons, I would probably consider an SQL-based backend then.
Currently I have many inbox'es and other mbox files of >25GB that is not sustainable. maildir with lots of files is also not an option.
I do not like the sound of "Postfix BCC feature", I use sendmail and
can duplicate messages with that, without altering anything in them. [...]
I am actually not sure yet how to achieve the copying. I am still a
I little confused anyway. 8-)
On the incoming side, I may not use Postfix at all, because Dovecot
actually needs to download the e-mails from the ISP mail server. I am
hoping that I can use a single "catch all" mailbox on the ISP. So I would need to copy the incoming e-mails in another way.
Just have them deliver via smtp to your local server. If your server is down, the messages will be kept on their server, until yours is up again.
On the outgoing side, anything sent (actually per SMTP relay) through our internal mail server could be copied somehow with some BCC feature. But if the user connects to the external ISP's SMTP servers directly, then I cannot get a copy so easily. Maybe I need to force the users to always use the internal mail server for sending.
Yes set enforcing spf, that should be sufficient for most cases.
In any case, let's say that the duplicate mails, stored somewhere else for data retention purposes, get altered in some way, like some header is added or changed in a predictable way. I am thinking of a header like "BCC: dataretention@example.com". Why would that be a problem?
It is probably not, but it is just not nice having to change user data to solve your problem.
Am 20.10.20 um 12:15 schrieb R. Diez:
First of all, thanks for your answer.
What is the problem with having huge online mailboxes? Just choose a > good european provider that has encryption all the way through to their> storage platform.
We already have a European ISP with a standard e-mail server. I wanted to keep our own mail server on premises, so that it is not exposed on the Internet. The current server fetches (and removes) all e-mails from the ISP. That has many advantages, for example, internal e-mail still works in case of an Internet outage.
If I wanted to change the setup, I would have to start evaluating such an "encryption all the way through to their storage platform" feature. I am not sure that it is worth the effort. In any case, that sounds like a limiting factor when choosing another ISP, in case the current one starts making trouble.
I had exactly the same idea about migrating. You have to think twice > about moving emails around of users. They do not like it ;) [...]
I don't really want to do that, that's why I wrote "If I set a mailbox size limit, users will have to delete old mails by themselves".
I do not know much about the legal aspects, but in case we need to keep all e-mails for legal data retention requirements, I would like to store those e-mails separately, so that if a user deletes it, the original e-mail is still archived somewhere else.
That is why I mentioned the Postfix's BCC feature. The idea is that you have a separate mailbox where a separate copy of all e-mails to and from all users land. That is the separate mailbox where I wanted to reorganise e-mails by date, in order to archive the e-mails in smaller chunks on a yearly basis. Those e-mails do not need to be online after all. Chances are, they will never be needed anyway.
I have created an 'archive' environment on a distributed filesystem, and it takes me quite a lot of persuading to have people (or allow me) to move messages from common Sent and Inbox mailboxes to the Archive namespace [...]
I am actually a newbie in mail service matters, but my guess is that there is no amount of persuasion that could possibly help. You have to set a hard limit per mailbox and let the users deal with it, don't you? Otherwise, sooner or later the server will overload. Or I would need to become a full-time e-mail server admin, which is not an option either!
I am actually a friend of having 2 backup disks that rotate, where one is always physically off premises, and offline. But I wonder how I could keep the backups encrypted and synchronised with 2 rotating disks. Maybe Veracrypt + rsync.
I am hoping that the amount of big attachments in all incoming and outgoing mail still fits in normal external USB 3.0 disks. Or at least a few years' worth of it per disk. But I still would not want to have say 1 TB of mail data online. That would make the VM unmanageable for part-time sysadmins like me.
[@~]# mailbox-ls.sh testtest size [...] I would not trust anyone else's programming with my users email, you should also not.
I am not sure that I would trust my own e-mail server programming abilities either. 8-)
If you have written such scripts, perhaps you could point me to some example scripts that I could use as a starting point for such e-mail reorganisation tasks?
[...] But when I migrate to mdbox this is not necessary anymore.
I am not sure that I would trust a file format where the indexes cannot be rebuilt if they become corrupt. If I need an advanced format for search performance reasons, I would probably consider an SQL-based backend then.
I do not like the sound of "Postfix BCC feature", I use sendmail and I can duplicate messages with that, without altering anything in them. [...]
I am actually not sure yet how to achieve the copying. I am still a little confused anyway. 8-)
On the incoming side, I may not use Postfix at all, because Dovecot actually needs to download the e-mails from the ISP mail server. I am hoping that I can use a single "catch all" mailbox on the ISP. So I would need to copy the incoming e-mails in another way.
On the outgoing side, anything sent (actually per SMTP relay) through our internal mail server could be copied somehow with some BCC feature. But if the user connects to the external ISP's SMTP servers directly, then I cannot get a copy so easily. Maybe I need to force the users to always use the internal mail server for sending.
In any case, let's say that the duplicate mails, stored somewhere else for data retention purposes, get altered in some way, like some header is added or changed in a predictable way. I am thinking of a header like "BCC: dataretention@example.com". Why would that be a problem?
Best regards, rdiez
why not use archive solution like
https://blog.sys4.de/mailarchiv-mit-dovecot-und-postfix-sortiert-nach-datum-...
sort with sieve, its not exact what you search but should be enough to go on, you might consider a double fetch with i.e getmail
https://blog.sys4.de/abholdienst-fur-mail-de.html
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Schleißheimer Straße 26/MG, 80333 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
R. Diez wrote:
I found some tools on the Internet to backup and export mails from IMAP to IMAP or maildir destinations. But I could not find a tool that just reorganises (moves) e-mails in such a manner inside an existing mailbox, maybe with a user-defined pattern for the destination folders.
You could try imapfilter (written in lua), it can move mails between mailboxes and do other things.
A short example:
oldmails = account1["INBOX"]:is_older(365) oldmails:move_messages(account1["Old"])
-- Victor Sudakov, VAS4-RIPE, VAS47-RIPN 2:5005/49@fidonet http://vas.tomsk.ru/
participants (4)
-
Marc Roos
-
R. Diez
-
Robert Schetterer
-
Victor Sudakov