How to move/reorganise existing e-mails to yearly subfolders

Marc Roos M.Roos at f1-outsourcing.eu
Tue Oct 20 14:35:28 EEST 2020


 >
 >First of all, thanks for your answer.
 >
 >> What is the problem with having huge online mailboxes? Just choose a 
> good european provider that has encryption all the way through to 
their> storage platform.
 >
 >We already have a European ISP with a standard e-mail server. I wanted 
to keep our own mail server on premises, so that it is not exposed on 
 >the Internet. The current server fetches (and removes) all e-mails 
from the ISP. That has many advantages, for example, internal e-mail 
 >still works in case of an Internet outage.

Use batched smtp, you will never miss an email, and just keep your on 
premises solution. You can do a lot with smtp configurations. You can 
even have email delivered on a 2nd location.
 
 >If I wanted to change the setup, I would have to start evaluating such 
an "encryption all the way through to their storage platform" 
 >feature. I am not sure that it is worth the effort. In any case, that 
sounds like a limiting factor when choosing another ISP, in case the 
 >current one starts making trouble.
 >
 >
 >> I had exactly the same idea about migrating. You have to think twice 
> about moving emails around of users. They do not like it ;)
 > > [...]
 >
 >I don't really want to do that, that's why I wrote "If I set a mailbox 
size limit, users will have to delete old mails by themselves".
 >
 >I do not know much about the legal aspects, but in case we need to 
keep all e-mails for legal data retention requirements, I would like to 
 >store those e-mails separately, so that if a user deletes it, the 
original e-mail is still archived somewhere else.
 >
 >That is why I mentioned the Postfix's BCC feature. The idea is that 
you have a separate mailbox where a separate copy of all e-mails to and 
 >from all users land. That is the separate mailbox where I wanted to 
reorganise e-mails by date, in order to archive the e-mails in smaller 
 >chunks on a yearly basis. Those e-mails do not need to be online after 
all. Chances are, they will never be needed anyway.

That should be simple to realize, just to folders to archive! You do not 
need BCC to have copies delivered to 2nd account. You also have to think 
about outgoing mail. Duplicate those as well. And force spf, so users 
cannot send message via any other outgoing mailservers.

 >
 >> I have created an 'archive' environment on a distributed filesystem, 
and it
 >> takes me quite a lot of persuading to have people (or allow me) to 
move
 >> messages from common Sent and Inbox mailboxes to the Archive 
namespace
 > > [...]
 >
 >I am actually a newbie in mail service matters, but my guess is that 
there is no amount of persuasion that could possibly help. You have to 
 >set a hard limit per mailbox and let the users deal with it, don't 
you? Otherwise, sooner or later the server will overload. Or I would 
need 
 >to become a full-time e-mail server admin, which is not an option 
either!

You have to explain to people the advantages, eg when adding a phone, it 
does not download sync a huge inbox or Sent folder. We are not having 
any limit's. With current day solutions, I would say there is no need 
to. You can also outsource the work on your on premisses vm ;)
  
 >I am actually a friend of having 2 backup disks that rotate, where one 
is always physically off premises, and offline. But I wonder how I 
 >could keep the backups encrypted and synchronised with 2 rotating 
disks. Maybe Veracrypt + rsync.

Sounds sufficient, luks encryption is also fine.
 
 >I am hoping that the amount of big attachments in all incoming and 
outgoing mail still fits in normal external USB 3.0 disks. Or at least a 

 >few years' worth of it per disk. But I still would not want to have 
say 1 TB of mail data online. That would make the VM unmanageable for 
 >part-time sysadmins like me.

At this point I do not see a need why you need to have any data online. 
The online servers just need to be properly configured for your 
on-premisses servers.

 >
 >> [@~]# mailbox-ls.sh testtest size
 > > [...]
 >> I would not trust anyone else's programming with my
 >> users email, you should also not.
 >
 >I am not sure that I would trust my own e-mail server programming 
abilities either. 8-)
 >
 >If you have written such scripts, perhaps you could point me to some 
example scripts that I could use as a starting point for such e-mail 
 >reorganisation tasks?
 >
 >
 > > [...]
 >> But when I migrate to mdbox this is not necessary anymore.
 >
 >I am not sure that I would trust a file format where the indexes 
cannot be rebuilt if they become corrupt. If I need an advanced format 
for 
 >search performance reasons, I would probably consider an SQL-based 
backend then.

Currently I have many inbox'es and other mbox files of >25GB that is not 
sustainable. maildir with lots of files is also not an option. 
 
 >
 >> I do not like the sound of "Postfix BCC feature", I use sendmail and 
I
 >> can duplicate messages with that, without altering anything in them.
 > > [...]
 >
 >I am actually not sure yet how to achieve the copying. I am still a 
little confused anyway. 8-)
 >
 >On the incoming side, I may not use Postfix at all, because Dovecot 
actually needs to download the e-mails from the ISP mail server. I am 
 >hoping that I can use a single "catch all" mailbox on the ISP. So I 
would need to copy the incoming e-mails in another way.

Just have them deliver via smtp to your local server. If your server is 
down, the messages will be kept on their server, until yours is up 
again.
 
 >On the outgoing side, anything sent (actually per SMTP relay) through 
our internal mail server could be copied somehow with some BCC 
 >feature. But if the user connects to the external ISP's SMTP servers 
directly, then I cannot get a copy so easily. Maybe I need to force the 
 >users to always use the internal mail server for sending.

Yes set enforcing spf, that should be sufficient for most cases.
 
 >In any case, let's say that the duplicate mails, stored somewhere else 
for data retention purposes, get altered in some way, like some 
 >header is added or changed in a predictable way. I am thinking of a 
header like "BCC: dataretention at example.com". Why would that be a 
problem?

It is probably not, but it is just not nice having to change user data 
to solve your problem. 
 


More information about the dovecot mailing list