Timo Sirainen wrote:
And before designing it I'd need to look into how the backup softwares usually work.. If anyone has any ideas about this, I'd like to hear.
Simple or even moderately efficient backup programs like rsync copy all the files. Of course, if the program copies directories A B C in that order, then while B is being backed up, a file A/x could be created, and a file C/index could be modified to reflect that A/x exists. C/index would be backed up, but A/x would not be.
The way around that is with snapshots. Linux supports this with LVM, FreeBSD with UFS and ZFS file systems, Windows with VSS, and all selfcontained disk arrays I know of implement snapshots. The application quiesces its disk writes, flushes buffers to disk, triggers a snapshot, and resumes work as usual.
Also backing up the attachment links could be problematic if the backup system doesn't support hard links. Each attachment always has at least 2 links, so if the backup doesn't realize that it at minimum duplicates the space used by attachments.
rsync recognizes hard links with option -H, but at a very noticeable performance cost when dealing with millions of files. If the aa/bb/aabccddeeff-etc is unique across the whole mailstore, it would be easy to replace the hard link with a symlink, as you said:
maybe not storing the attachments directly to backups, but add symlinks to them so they can be used to figure out what to restore. Or maybe the backing up wouldn't need a special tool, but the restoring tool could just read through the dbox files to see what attachments are also needed and write a list of them somewhere so they can be taken from backups as well.
In the second way, you would have a separate hierarchy for multiple-recipient attachments, or would the attachment be "really" stored in the box of a recipient chosen at random?
Just some random thoughts: professionally, I use Zimbra. Messages are stored in Maildir-equivalents. The time it takes to backup is a quite severe constraint on the backup technique. For example, compressing the backup files takes too long, so the zip files are not compressed. Instead, the individual mails are stored compressed on disk. Each backup zips up the mails in a few big backup files. An improvement could be to sort mails into backup zip files so that once a zip file is made, it stays the same. After all, if a mail is not deleted a month after it is read, then it will probably stay in the same state forever, or at least until the user starts a keep-me-under-quota cleaning-up spree. During this time, backing up that big zip file can just be a check to see if it is already OK in the backup, which is much quicker. I have no idea if this could be applied to Dovecot, but who knows.