On 10/8/2012 4:37 PM, Christoph Anton Mitterer wrote:
The proper way to accomplish your goals, or at least the big ones.
- I generally want to have _all_ mail (which is not sorted out because of being spam) to be archived at the local server.
http://www.postfix.org/postconf.5.html#always_bcc
- But(!) I want to selectively keep (in addition) mail at the internet server. For example I may want to select the folder that contains all mail form some friend to be kept online completely.
See above.
But I may want to decide that mailinglists keep only the last 10 days and/or 1000 messages of mail.
http://wiki2.dovecot.org/Plugins/Expire
Does age based deletion, but not folder message count based. You must use your MUA, TBird, for the latter. It's far easier to configure this in TBird than in Dovecot config files. You seem like the type who wants flexibility so you can change things often, so use TBird to be happy here.
- The idea is, that the local server regularly (when it is online/running) catches new mail from the internet server... and stores it in the archive.
This is not an option. The system must be up and connected to the internet 24x7x365. It must have an MX record associated and a valid domain, or a VPN tunnel and entries in both systems hosts files, along with a Postfix transport table, and other tweaks.
http://www.postfix.org/transport.5.html
If you refuse to run this "local server" 24x7x365 then you will have to use a fetchmail based solution, which will not work well, and whose configuration will prompt you to kill yourself. I cannot help you with any of that.
- So apart from new mail that has not yet been read, that local archive contains always all mails that are also on the internet server... the later may contain (for specific directories) the same, or just parts of.
No. Mail arriving at the colo/VPS host is immediately sent to the always_bcc address, an address and mailbox on your home server. You will create a duplicate IMAP folder structure on the home server by hand in your MUA. Once this is completed you will write individual user sieve scripts that sort the mail into folder just as it is sorted on the colo/VPS server. Basically, home server Dovecot IMAP config is identical in structure to colo/VPS setup, only the mailbox account names differ. Folder tree, folders, sieve scripts identical, retention policy different.
- The MUAs will then have two imap accounts, one to the internet server and one to the local archive,... each one being usable, depending on where I am.
Yep.
- This is where my first problem arises: How can I implement that mail flow, especially:
- How do I secure that all mail is read from the internet server (i.e. that nothing is "forgotten"?
Done: always_bcc
- How do I make sure that no mails are retrieved twice (or more)? A problem which I often had with pop, when the mail client crashed during sync?
Done: always_bcc
- Further it must be secured, that when I delete something on the internet server, it is NOT deleted on the local server (on the next mail-fetching).... this is why I don't use the word "sync".
Done: always_bcc
a) One stupid solution would be, that I duplicate all mail on the online server,... one part is for staying online, one part is for being fetched to the local archive.
Done: always_bcc
And yes that is stupid.
As soon as it was fetched... that copy gets removed (always). That solution would give a clean and secured separation of both? b) I don't think offlineimap or any other caching-like solution is the right thing... especially as one must always fear that such a cache may be accidentally wiped.
Are there better solutions than (a)?
Yes. Already done: always_bcc
- Problem would be already a refinement of a working solution for (1) (but obviously not when using (1).(a) ). When e.g. reply to or forward a mail using the online server,... and that mail had already been fetched,... can I make the flag synced?
No. Your stated goal is that the local server is a mail archive put into service due to limited space on your colo/VPS server. An archive is an archive, not a secondary online server. It should only be accessed, read only, when you want to search and read an old message. And in fact, since this is an archive, you should implement the zlib plugin with dbox so all this archived mail is compressed in real time.
Make up your mind. You can't have it both ways. I hear the iPhone5 can do anything automatically, no setup. Get one of those, problem solved. ;)
- Is dovecot suitable for the local server?
Yes. Probably more than any other IMAP server.
- I couldn't sue maildir locally, because I loose just to much space to the block fragmentation.
Maildir causes the least filesystem fragmentation. You must be thinking of mbox, which causes heavy fragmentation due to constant appends past EOF. As I said you need dbox. One email per file, similar to maildir, but better integration and performance with Dovecot.
- I'd prefer not to use dbox (the thing that the indices are crucial scares me a bit off).
Are you designing/building this home server to be unreliable? Does it crash often? If so fix that problem and dbox is fine. If can't make it reliable use maildir which has expendable indexes.
a) When using mbox... is dovecot able to manage a really big folder hierarchy that basically ever keeps growing... with easily several 100k mails per folder... and that is in total already over 100GB?
You have 100K emails in a single Dovecot mbox file? Or are you talking about an IMAP folder in TB that has no email in it, but many more IMAP folders whose combined email total is 100K?
If you're worried about dbox index corruption, then you should be far more worried about mbox file corruption. With mbox files that large I'm surprised you've not hit it already. This would suggest that system is pretty stable.
- I would prefer to have fast full text search. Does dovecot provide this?
Yes. The problem with speed is two fold:
You must FTS often to keep the search indexes up to date. Wait a week between searches, after many new emails have been added to the IMAP folder, and your search crawls, as the file contents must be reindexed before the search starts. So you need to have a cron'd script that searches daily to keep the indexes up to date.
The mailbox file formats that best avoid fragmentation also have the slowest FTS times as the OS much open every file, 100K of them. If you use mbox or mdbox, you have far fewer files to open. mbox has the fastest FTS times of any format when indexes aren't fully up to date. It's also the fastest when updating the indexes. Your home server probably has a single SATA disk. mbox wins hands down for FTS due to very low IOPS load on the disk. The downside here is lack of good compression support--once you compress an mbox file you can't add new mail to it. This is where mdbox with compression comes in handy. With you 100K emails declaration, I think you're best served by mdbox with zlib compression.
I was looking into database backed mail systems (again,... just for the local archive)... namely dbmail and archiveopteryx (are there other open source solutions?)... Not sure which of the two... or whether it's a good idea at all. I remember some dovecot wiki page that showed a comparison which said that both do not perfectly implement imap.
Any suggestions with respect to that?
If you're worried about fragmentation, or performance, I'd steer clear of a database driven mail store.
Please, please, do not reply to each of my points here, and do not make this thread 100 replies. I'm not here to hold your hand. I don't have the time (nor patience) to engage in these lengthy emails. I gave you the architectural overview to build the correct solution to your problem. It's up to you to choose to use it or not, and if so, to do your own homework and self education, asking here only if something is unclear to you.
In closing, you need real time bcc delivery which solves a ton of your mentioned problems. I'm not open to debating the merits of this. If you're not willing to meet the requirements for always_bcc, and you're determined to power the home server down most of the time, then you need assistance from someone else, as I simply have never used fetchmail, period, and have no idea if it can meet your needs. My guess is no, simply because, AFAIK, it doesn't work with LDA, which means you can't use sieve scripts and Dovecot's automatic sorting and indexing.
Good luck.
-- Stan