questions for a unusual master/slave like mail-server setup
Hey folks.
I was planning since a long time for some more advanced mail server setup (described below), which basically has some weird ;) master/slave ideas behind. It's probably a bit more complicated (and maybe also a bit more insane) than the usual mail server setup, so I'd be happy about any advice and some questions answered:
There are two mailservers (dovecot, as you may have guessed ;) ):
- IMAP shall be used; using POP3 is not planned.
- A format in which each mail is stored in a separate file shall be used. That leaves for either maildir or sdbox, where I'd prefer maildir, as I like to be able to loose the indexes.
- Each (per user) mailbox shall have many subfolders and will receive loads of mail (including from several high traffic mailing lists).
- Mail filtering shall take place *at least* on the "online" server, probably either with sieve or maildrop.
- One of the mailservers is always "online", directly connected to the internet (and globally available).
- The other mailserver is "offline", not directly connected to the internet (and only locally available).
The "online" mailserver:
- Is in a not fully trusted location (i.e. at some ISP).
- Has somewhat limited storage.
- Makes the mails on it available to MUAs connecting from anywhere.
- Receives mail from various sources, e.g. via postfix/LDA or something like fetchmail.
- Has only a parts of *all* mails, that means for example: - *All* mail from some personal contacts. - Only the last 1 year for low volume mailing lists. - Only the last 4 months for high volume mailing lists.
The "offline" mailserver:
- Is in a trusted location (beneath my pillow ;) ).
- Has enough storage for the purposes of mail.
- Makes all mails available to MUAs connecting from the LAN.
- Receives mails from the "online" mailserver, when manually "moved" to it via some MUA over IMAP.
- Has *all* mails.
Client's MUAs, may connect to both, but the "offline" server will only be available at home.
As you can imagine, the idea is basically the following: I get large amounts of mails, from all kind of mailing lists as well as personal mail. All that is filtered to a tree hierarchy of folders. There's a not fully secure mailserver, where mail arrives and which I can use when I'm on the way. It contains either just recent mails (the amount depending on the traffic of the list) or for selected senders/lists everything, and I control basically on a per folder basis how much should be retained (all, last 10 years, last week, etc.). There's another server, where *all* mail should eventually end up, for archival and security reasons. This implies obviously, that some of the mail is on both servers, some only on the "offline" server, and that this may vary from folder to folder.
Further, on both servers, I may want to add filters, which basically split up a folder, e.g. of high traffic lists, so that mail of previous years or months is moved to subfolder structures, e.g.: linux-kernel +--2014 \--2013 and so on.
So far, all that wouldn't be too difficult,... but as you can imagine, I'd like to have a bit more, which actually makes things quite tricky.
How the two mailservers shall synchronise: I don't want to "simply" rsync - the whole system needs to be a bit smarter:
It's okay to assume, that synchronisation happens only at discrete times and not continuously (the "offline" server doesn't run always).
It would be ok, if the synchronisation requires a downtime of the "online" mailserver (i.e. the LDA). Since btrfs is used, I could also easily do snapshots.
One can also assume that, when synchronisation happens, the "offline" server contains at least all those folders, that the "online" server has as well (and possibly more).
Obviously, only maildir's cur/ directories should be synced (stuff in tmp&new may be still arriving).
As mentioned previously, not all mail may stay on the "online" server, depending on the folder, it will either stay permanently, or only for some period.
However, on the "offline" server, all mail stays, unless I manually delete it.
One must assume, that mails on *both* servers: - change their IMAP status (as in (S)een, (A)nswered, (F)lagged, etc.) - are moved to other mail folders - may be deleted via MUAs (not clear yet where this can happen) - "Fresh" mail, however, only appears on the "online" server.
Mails, once on the "offline" server, shouldn't be rewritten, in the sense that the mail is deleted and recreated (hopefully with the same content), which could happen either wenn the status of the mail changes or when it's moved to another folder (on the other server). The reason is simply, that I wouldn't fully trust the online server, so once the mail is local, it should be immutable (except for the status and folder location).
The last, two points make things pretty difficult, or at least I wouldn't see a straightforward solution.
(4) would be quite easy, e.g. when rsync would be used, then I'd just rsync all the cur/ directory. The rsync would happen not using --delete, so that mail which is no longer on the "online" server, doesn't get deleted from the "offline" server.
(5) one could do, by "simply", on a per folder basis, remove mails older than X on the "online" server, after it has been copied to the "offline" server. An already existing tool for doing that would be nice, though.
(7) and (8) is IMHO the most difficult: The only idea I have so far assumes that each mail file has an ID, which is unique for the *whole* mailbox (not just for each folder, as the mails may be moved between folders).
The idea would be roughly, the following:
When I synchronise, I either shut down the LDA or make a snapshot of the "online" server's mailbox and work with that.
Then, I make a list (ID, status and pathname) of *all* mails on the online server and another one of *all* mails of the offline server.
Then, I probably scan for duplicate IDs in each list... just to be safe.
Then I process the mail lists as follows: a) If an ID is on *both* servers with the same status and pathname, than that mail had been previously synced already and neither moved nor changed its status since then. Nothing to do. b) If an ID is on *both* servers but with different pathnames or status, then the mail has either been moved or it's status has changed. Consequently, I would move/rename the mail to match the new location and/or IMAP-status. I would move/rename that with the older ctime to match the one with the newer (ctime gets set at move or rename and thus at status change). Even if an attacker would have fiddled with the "online" server's time or somehow else my system times were bogus, the worst that would happen was, that status changes are lost or mail ends up in the wrong folder. c) If an ID is on the "offline" server but not the "online" server, it could be either a mail that has already been garbage collected on the "online" server (so nothing would needed to be done),... or it could be mail hat was removed (via a MUA) on the "online" server and that should as well be removed on the "offline" server, either automatically, or manually after checking. How to find out which of that applies!? d) If an ID is on the "online" server but not the "offline" server, it could be either fresh mail that would needed to be synced,... or it could be mail that was removed on the "offline" server and should as well deleted on the "online" server, either automatically, or manually after checking.
(a) and (b) are again rather simple, (c) and (d) not so much...
Possible solutions that I could think of:
- Either simply don't sync the removal of mails from the "online" server to the "offline" server and vice-versa. It will anyway happen rather rarely.
- Remember a list of all IDs, that have been garbage collected on the "online" server.
- Remebmer a list of all IDs, that have been removed via MUAs. => not sure if that's easily doable, probably not, is there any dierct way one could do this with dovecot.
- Something to find out which mail has been gone on both, the each, the "offline" and the "online" server, since the last synchronisation, via tricks using the snapshot features of btrfs.
- Somehow using the date of now and the last synchronisation and the mtimes (which are here basically like creation times, as the mail file contents never change). But I guess that would only work for (d) and not for (c),... and I'd rather not depend on times too much.
- In any case, I could give the user a list of mails that would be deleted, for double checking. I actually think this would scale, as I don't expect many mails to be manually deleted through the MUA *once they had been synced*.
After that, I think the two servers should be in sync - in the sense described in the beginning.
Some further parts of the idea: As said, snapshots and the refcopy feature of btrfs could be quite helpful. So I could basically keep:
- the "online" mailserver's mailbox - as of the last sync - as of now
- the "offline" mailserver's mailbox - as of the last sync - as of now at the disk of the "offline" server without much additional cost. They'd all share reflinks and wouldn't use (much) additional space (except for fs meta-data) and I could use the snapshots to find out which files were deleted on either of the two servers since last time.
Actually copying new mail from the "online" server to the offline server's disk (in the "as of now" area), could then happen via rsync'ing every cur/ from the "online" server to the "offline" server, with --delete and --ignore-existing (as I don't trust the "online" server and mail file contents should be immutable).
So... questions... :D the "offline" server has:
- Obviously, does anyone know a better solution, or has some improvements, other ideas? Or are there any problems with the above, which I haven't realised, or would you think it would work?
- Does dovecot *ever* change the contents (i.e. the file contents) of a maildir mail file?
- As ID I'd have thought that the maildir mail filename might work, e.g. 1234567890.M20046P2137.mailserver,S=4542,W=4642:2,Sb I mean especially the first part: 1234567890.M20046P2137.mailserver Is that unique across *all* folders of a mailbox? AFAIU, that name would be determined by whatever delivers the mail, right? So can I somehow configure that to make it unique?
- Does dovecot ever change anything from the maildir mail filenames, except anything after the ":" above? I.e. would it ever change: 1234567890.M20046P2137.mailserver,S=4542,W=4642 ?
- Would dovecot every copy, truncate+copy or somehow else rewrite files, instead of just moving them (i.e. thus possibly changing inodes, reflinks or that like)?
- AFAIU, when using maildir, dovecots index files can get completely lost,... and one can fully "recover" all mails and their status. The only "bad" thing that might happen is that MUAs re-load all mail, right?
- How graceful does dovceot handle the situation when mail is added or removed to/from cur/ by not using dovecot's IMAP,... especially with respect to index files. In the above solution I'd do a lot adding/moving/deleting files in maildirs, which would be done purely at the (file) system level, so the index would quite often get invalidated. How does it recognise that (if at all)? And is this generally stable, or should I rather remove any indexes after syncing and have them rebuilt by that?
- All the above doesn't handle yet the thing of having yearly/monthly/etc. archive subfolders as described above, like: linux-kernel +--2014 \--2013 possibly on *both* "online" and "offline" server and possibly even in a different layout. I'm not yet sure whether this would be easily doable with the above solution,... my basic idea would be simply consider any such "archive" subfolders as the same than the "base" folder during syncing... and afterwards do re-order the mails on the "offline" and "online" server's folders as necessary. For example, if the "online" server has: linux-kernel +--01 +--02 +--03 ... \--12 in other words, one archive folder for the each of the last 12 months. While
linux-kernel +--2014 \--2013 The script would then e.g. move the files to their new folders. => does anyone know, whether such thing exists already, i.e. a tool that sorts the mails into maildir subfolders, based on years, months, etc? 9) As said before, I'd like to be able to loose any indexes, but AFAICS, my solution wouldn't anyway work with sdbox. But if there's an sdbox based solution that makes everything much easier, do not hesitate to tell me :) 10) In the long-term I hope to be able to do something that makes server-side search (i.e. through the big fat mail archive) much faster (or is that already supported in dovecot?). I haven't really looked at all into that topic (just know about notmuch[0]). Anyway, if anyone sees problems in my above solution, when that would be used with something that gives me better server side search (i.e. some indexing solution),... I'd be happy to hear about that as well.
Thanks a lot so far, Chris.
participants (1)
-
Christoph Anton Mitterer