[Dovecot] Questions about single intance storage

Lorens Kockum dovecot.fdop at tagged.lorens.org
Mon Dec 5 00:07:05 EET 2011


Timo Sirainen wrote:
> And before designing it I'd need to look into how the backup
> softwares usually work.. If anyone has any ideas about this,
> I'd like to hear.

Simple or even moderately efficient backup programs like rsync
copy all the files. Of course, if the program copies directories
A B C in that order, then while B is being backed up, a file
A/x could be created, and a file C/index could be modified to
reflect that A/x exists. C/index would be backed up, but A/x
would not be.

The way around that is with snapshots. Linux supports this
with LVM, FreeBSD with UFS and ZFS file systems, Windows with
VSS, and all selfcontained disk arrays I know of implement
snapshots. The application quiesces its disk writes, flushes
buffers to disk, triggers a snapshot, and resumes work as usual.

> Also backing up the attachment links could be problematic if
> the backup system doesn't support hard links. Each attachment
> always has at least 2 links, so if the backup doesn't realize
> that it at minimum duplicates the space used by attachments.

rsync recognizes hard links with option -H, but at a very
noticeable performance cost when dealing with millions of
files. If the aa/bb/aabccddeeff-etc is unique across the whole
mailstore, it would be easy to replace the hard link with a
symlink, as you said:

> maybe not storing the attachments directly to backups, but add
> symlinks to them so they can be used to figure out what to
> restore. Or maybe the backing up wouldn't need a special tool,
> but the restoring tool could just read through the dbox files
> to see what attachments are also needed and write a list of
> them somewhere so they can be taken from backups as well.

In the second way, you would have a separate hierarchy for
multiple-recipient attachments, or would the attachment be
"really" stored in the box of a recipient chosen at random?

Just some random thoughts: professionally, I use
Zimbra. Messages are stored in Maildir-equivalents. The time
it takes to backup is a quite severe constraint on the backup
technique. For example, compressing the backup files takes
too long, so the zip files are not compressed. Instead, the
individual mails are stored compressed on disk. Each backup
zips up the mails in a few big backup files. An improvement
could be to sort mails into backup zip files so that once a
zip file is made, it stays the same. After all, if a mail is not
deleted a month after it is read, then it will probably stay
in the same state forever, or at least until the user starts a
keep-me-under-quota cleaning-up spree. During this time, backing
up that big zip file can just be a check to see if it is already
OK in the backup, which is much quicker. I have no idea if this
could be applied to Dovecot, but who knows.



More information about the dovecot mailing list