[Dovecot] (Single instance) attachment storage with single-dbox
Just a note to myself and whoever else cares, should be added to wiki once it has its own page about this:
With single-dbox messages can be copied with hard linking. This means that there can be multiple files that point to the same attachment file. The attachment is now deleted only once the mail file's link count drops to zero, so this works fine..
..until someone goes and starts manually copying files or maybe restoring from backups or whatever, causing the hard links to be replaced with separate files having link count=1, even when there are other files pointing to the same attachment file. Now deleting this one mail will delete the attachment too early.
I can't think of any other reasonable way to handle this though, so unless someon has some great ideas, I think the solution is to simply add enough warnings that message store shouldn't be accessed directly. Maybe add some import/export commands to doveadm which can be used to add a bunch of mails to storage without doing it directly on filesystem.
On 2010-09-24 2:57 PM, Timo Sirainen wrote:
Maybe add some import/export commands to doveadm which can be used to add a bunch of mails to storage without doing it directly on filesystem.
+1
I've always just used cp -rp to restore mail, so yeah, if that would break SiS, I agree there should be a doveadm command for restoring mails.
--
Best regards,
Charles
On 9/24/2010 11:57 AM, Timo Sirainen wrote:
Just a note to myself and whoever else cares, should be added to wiki once it has its own page about this:
With single-dbox messages can be copied with hard linking. This means that there can be multiple files that point to the same attachment file. The attachment is now deleted only once the mail file's link count drops to zero, so this works fine.. Why is this specific to sdbox? Does mdbox SIS work differently?
Daniel
On 25.9.2010, at 9.49, Daniel L. Miller wrote:
On 9/24/2010 11:57 AM, Timo Sirainen wrote:
Just a note to myself and whoever else cares, should be added to wiki once it has its own page about this:
With single-dbox messages can be copied with hard linking. This means that there can be multiple files that point to the same attachment file. The attachment is now deleted only once the mail file's link count drops to zero, so this works fine.. Why is this specific to sdbox? Does mdbox SIS work differently?
mdbox internally keeps a reference count of how many copies of messages there exists and the attachment isn't deleted until the last reference is gone. So in mdbox even without attachments there exists a bit similar problem: if you manually add/copy mailbox directories the reference counts get messed up.
On 24/09/2010 19:57, Timo Sirainen wrote:
Just a note to myself and whoever else cares, should be added to wiki once it has its own page about this:
I have been meaning to say we should have a wiki page about this.
With single-dbox messages can be copied with hard linking. This means that there can be multiple files that point to the same attachment file. The attachment is now deleted only once the mail file's link count drops to zero, so this works fine..
..until someone goes and starts manually copying files or maybe restoring from backups or whatever, causing the hard links to be replaced with separate files having link count=1, even when there are other files pointing to the same attachment file. Now deleting this one mail will delete the attachment too early.
I can't think of any other reasonable way to handle this though, so unless someon has some great ideas, I think the solution is to simply add enough warnings that message store shouldn't be accessed directly. Maybe add some import/export commands to doveadm which can be used to add a bunch of mails to storage without doing it directly on filesystem.
Conceptually there is an attachment table with a primary key, and a message table which has foreign keys referring into the attachment table. In database theory, you could set up a foreign key constraint, and then it would not be possible to remove attachments which were still referenced by messages.
I'm not saying all this data should be under *SQL; just thinking aloud.
Can we do something /like/ that but only using a filesystem?
Perhaps each message could have its own hard links to the attachment file? That way a message's attachments would be kept in existence by the filesystem itself.
When you store a message with attachments, you could store the message file:
u.123
and have its attachments stored as hard links under names which appear adjacent to the message file's name when the directory listing is sorted.
u.123.a.1 u.123.a.2
Because the attachment files are adjacently-named to the message file, people accessing the sdbox mailstore as a filesystem should realise that they must backup/restore the message file and its associated attachment files.
Bill
On 25.9.2010, at 11.50, William Blunn wrote:
I can't think of any other reasonable way to handle this though, so unless someon has some great ideas, I think the solution is to simply add enough warnings that message store shouldn't be accessed directly. Maybe add some import/export commands to doveadm which can be used to add a bunch of mails to storage without doing it directly on filesystem.
Conceptually there is an attachment table with a primary key, and a message table which has foreign keys referring into the attachment table. In database theory, you could set up a foreign key constraint, and then it would not be possible to remove attachments which were still referenced by messages.
I'm not saying all this data should be under *SQL; just thinking aloud.
Can we do something /like/ that but only using a filesystem?
Perhaps each message could have its own hard links to the attachment file? That way a message's attachments would be kept in existence by the filesystem itself.
That's basically how it works for most messages. The attachment ID is stored in the dbox metadata and for each message it's a unique ID which is a hard link to the original message. The problem is that when copying a message with hard linking you can't give a new attachment ID, because the message contents can't be updated.
But .. I suppose it could be possible to use a combination of attachment-id + mailbox GUID + message UID number. I'll have to think about it..
When you store a message with attachments, you could store the message file:
u.123
and have its attachments stored as hard links under names which appear adjacent to the message file's name when the directory listing is sorted.
u.123.a.1 u.123.a.2
Because the attachment files are adjacently-named to the message file, people accessing the sdbox mailstore as a filesystem should realise that they must backup/restore the message file and its associated attachment files.
The attachments aren't stored in the same directory as the message files. That would make it work badly when multiple mount points exist.
On Sat, 2010-09-25 at 13:34 +0100, Timo Sirainen wrote:
But .. I suppose it could be possible to use a combination of attachment-id + mailbox GUID + message UID number. I'll have to think about it..
Implemented. Although saving messages now requires an extra rename(). It should be possible to get rid of this at some point, but for now it works like:
- Attachment saving is first started to a temp file.
- Attachment is written to temp file.
- Attachment writing is finished and file is renamed to <hash>-<guid>
- <possibly more attachments and messages are saved>
- Index is locked, so new messages can be assigned IMAP UIDs
- Attachments are renamed from <hash>-<guid> to <hash>-<guid>-<mailbox_guid>-<imap_uid>
- Index is unlocked
(Getting rid of step 3 would require some FS API changes.)
Copying works basically the same:
- Source file's attachments are linked to a temp file in destination.
- <possibly more attachments and messages are copied>
- Index is locked, so new messages can be assigned IMAP UIDs
- Attachments are renamed from temp files to <hash>-<guid>-<mailbox_guid>-<imap_uid>
- Index is unlocked
The attachment paths in dbox metadata contain only the <hash>-<guid> part, so when accessing any attachments the <mailbox_guid>-<imap_uid> must be manually appended to the path.
On Tue, 2010-10-19 at 17:28 +0100, Timo Sirainen wrote:
But .. I suppose it could be possible to use a combination of attachment-id + mailbox GUID + message UID number. I'll have to think about it..
Implemented.
Oh, forgot to say: Now attachments won't get lost if you replace hard linked message filess with copies of the files. So it's now safe to e.g. move users' directories across mount points or backup/restore them without going through Dovecot tools.
participants (4)
-
Charles Marcus
-
Daniel L. Miller
-
Timo Sirainen
-
William Blunn