[Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

Timo Sirainen tss at iki.fi
Wed Aug 12 20:51:41 EEST 2009


On Wed, 2009-08-12 at 18:42 +0100, Ed W wrote:
> > Something like that. In dbox you have one storage directory containing
> > all mailboxes' mails (so that copying can be done by simple index
> > updates). Then you have a bunch of files, each about n MB (configurable,
> > 2 MB by default). Expunging initially only marks the message as expunged
> > in index. Then later (or immediately, configurable) you run a cronjob
> > that goes through all dboxes and actually removes the used space by
> > recreating those dbox files.
> >   
> 
> Yeah, sounds good.
> 
> You might consider some kind of "head optimisation", where we can
> already assume that the latest chunk of mails will be noisy and have a
> mixture of deletes/appends, etc.  Typically mail arrives, gets
> responded to, gets deleted quickly, but I would *guess* that if a mail
> survives for XX hours in a mailbox then likely it's going to continue
> to stay there for quite a long time until some kind of purge event
> happens (user goes on a purge, archive task, etc)

If disk space usage isn't such a huge problem, I think the nightly
purges solve this issue too. During the day user may get mails and
delete them, and at night the deleted mails are purged. Perhaps it could
help a bit if new mails were all stored in separate file(s) and at night
then appended to some larger existing file, but that optimization can be
left until later. :)

> Oh, have you considered some "optional" api calls in the storage API?
> The logic might be to assume that someone wanted to do something
> clever and split the message up in some way, eg store headers
> separately to bodies or bodies carved up into mime parts.  The
> motivation would be if there was a certain access pattern to optimise.
> Eg for an SQL database it may well be sensible to split headers and
> the message body in order to optimise searching - the current API may
> not take advantage of that?  

Well, files have paths. I think the storage backend can determine from
that what type the data is. So if you're writing to mails/foo/bar/123 it
means you're storing a message with ID 123 to mailbox "foo/bar". It
could then internally parse the message and store its header/body/mime
separately.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://dovecot.org/pipermail/dovecot/attachments/20090812/b207dcb3/attachment.bin 


More information about the dovecot mailing list