[Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem
Ed W
lists at wildgooses.com
Wed Aug 12 20:42:02 EEST 2009
>>> CouchDB seems like it would still be more difficult than necessary to
>>> scale. I'd really just want something that distributes the load and
>>> disk usage evenly across all servers and allows easily plugging in
>>> more servers and it automatically rebalances the load. CouchDB seems
>>> like much of that would have to be done manually (or building scripts
>>> to do it).
>>>
>> Ahh fair enough - I thought it being massively multi-master would allow
>> simply querying different machines for different users. Not a perfect
>> scale-out, but good enough for a whole class of requirements...
>>
>
> If users' all mails are stuck on a particular cluster of servers, it's
> possible that suddenly several users in those servers starts increasing
> their disk load or disk usage and starts killing the performance /
> available space for others. If a user's mails were spread across 100
> servers, this would be much less likely.
>
Sure - I'm not a couchdb expert, but I think the point is that we would
need to check the replication options because you would simply balance
the requests across all the servers holding those users' data. I'm kind
of assuming that data would be replicated across more than one server
and there would be some way of choosing which server to use for a given user
I only know couchdb to the extent of having glanced at the website some
time back, but I liked the way it looks and thinks like Lotus Notes (I
did love building things using that tool about 15 years ago - the
replication was just years ahead of it's time. The robustness was
extraordinary and I remember when the IRA blew up a chunk of Manchester
(including one of our servers) that everyone just went home and started
using the Edinburgh or London office servers and carried on as though
nothing happened...)
Actually it's materialised views are rather clever also...
>
>>> Hmm. I don't really see how it looks like log structured storage.. But
>>> you do know that multi-dbox is kind of a maildir/mbox hybrid, right?
>>>
>> Well the access is largely append only, with some deletes and noise at
>> the writing end, but largely the older storage stays static with much
>> longer gaps between deletes (and extremely infrequent edits)
>>
>
> Ah, right. I guess if you think about it from a "single user's mails"
> point of view.
>
Well, single folder really
>> So maildir is optimised really for deletes, but improves random access
>> for a subset of operations. Mailbox is optimised for writes and seems
>> like it's generally fast for most operations except deletes (people do
>> worry about having a lot of eggs in one basket, but I think this is
>> really a symptom of other problems at work). Mailbox also has improved
>> packing for small messages and probably has improved cache locality on
>> certain read patterns
>>
>
> Yes, this is why I'm also using mbox on dovecot.org for mailing list
> archives.
>
Actually I use maildir, but apart from delete performance which is
usually rare, mailbox seems better for nearly all use patterns
Seems like if it were possible to "solve" delete performance then
mailbox becomes the preferred choice for many requirements (also lets
solve the backup problem where the whole file changes every day)
>> So one obvious hybrid would be a mailbox type structure which perhaps
>> splits messages up into variable sized sub mailboxes based on various
>> criteria, perhaps including message age, type of message or message
>> size...? The rapid write delete would happen at the head, perhaps even
>> as a maildir layout and gradually the storage would become larger and
>> ever more compressed mailboxes as the age/frequency of access/etc declines.
>>
>> Perhaps this is exactly dbox?
>>
>
> Something like that. In dbox you have one storage directory containing
> all mailboxes' mails (so that copying can be done by simple index
> updates). Then you have a bunch of files, each about n MB (configurable,
> 2 MB by default). Expunging initially only marks the message as expunged
> in index. Then later (or immediately, configurable) you run a cronjob
> that goes through all dboxes and actually removes the used space by
> recreating those dbox files.
>
Yeah, sounds good.
You might consider some kind of "head optimisation", where we can
already assume that the latest chunk of mails will be noisy and have a
mixture of deletes/appends, etc. Typically mail arrives, gets responded
to, gets deleted quickly, but I would *guess* that if a mail survives
for XX hours in a mailbox then likely it's going to continue to stay
there for quite a long time until some kind of purge event happens (user
goes on a purge, archive task, etc)
Sounds good anyway
Oh, have you considered some "optional" api calls in the storage API?
The logic might be to assume that someone wanted to do something clever
and split the message up in some way, eg store headers separately to
bodies or bodies carved up into mime parts. The motivation would be if
there was a certain access pattern to optimise. Eg for an SQL database
it may well be sensible to split headers and the message body in order
to optimise searching - the current API may not take advantage of that?
Ed W
More information about the dovecot
mailing list