Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

12 Aug 2009 · *guess*

      ...
...
...
CouchDB seems like it would still be more difficult than necessary to
scale. I'd really just want something that distributes the load and
disk usage evenly across all servers and allows easily plugging in
more servers and it automatically rebalances the load. CouchDB seems
like much of that would have to be done manually (or building scripts
to do it).
Ahh fair enough - I thought it being massively multi-master would allow
simply querying different machines for different users.  Not a perfect
scale-out, but good enough for a whole class of requirements...
If users' all mails are stuck on a particular cluster of servers, it's
possible that suddenly several users in those servers starts increasing
their disk load or disk usage and starts killing the performance /
available space for others. If a user's mails were spread across 100
servers, this would be much less likely.
Sure - I'm not a couchdb expert, but I think the point is that we would
need to check the replication options because you would simply balance
the requests across all the servers holding those users' data.  I'm kind
of assuming that data would be replicated across more than one server
and there would be some way of choosing which server to use for a given user
I only know couchdb to the extent of having glanced at the website some
time back, but I liked the way it looks and thinks like Lotus Notes (I
did love building things using that tool about 15 years ago - the
replication was just years ahead of it's time.  The robustness was
extraordinary and I remember when the IRA blew up a chunk of Manchester
(including one of our servers) that everyone just went home and started
using the Edinburgh or London office servers and carried on as though
nothing happened...)
Actually it's materialised views are rather clever also...
...
...
...
Hmm. I don't really see how it looks like log structured storage.. But
you do know that multi-dbox is kind of a maildir/mbox hybrid, right?
Well the access is largely append only, with some deletes and noise at
the writing end, but largely the older storage stays static with much
longer gaps between deletes (and extremely infrequent edits)
Ah, right. I guess if you think about it from a "single user's mails"
point of view.
Well, single folder really
...
...
So maildir is optimised really for deletes, but improves random access
for a subset of operations.  Mailbox is optimised for writes and seems
like it's generally fast for most operations except deletes (people do
worry about having a lot of eggs in one basket, but I think this is
really a symptom of other problems at work).  Mailbox also has improved
packing for small messages and probably has improved cache locality on
certain read patterns
Yes, this is why I'm also using mbox on dovecot.org for mailing list
archives.
Actually I use maildir, but apart from delete performance which is
usually rare, mailbox seems better for nearly all use patterns
Seems like if it were possible to "solve" delete performance then
mailbox becomes the preferred choice for many requirements (also lets
solve the backup problem where the whole file changes every day)
...
...
So one obvious hybrid would be a mailbox type structure which perhaps
splits messages up into variable sized sub mailboxes based on various
criteria, perhaps including message age, type of message or message
size...?  The rapid write delete would happen at the head, perhaps even
as a maildir layout and gradually the storage would become larger and
ever more compressed mailboxes as the age/frequency of access/etc declines.
Perhaps this is exactly dbox?
Something like that. In dbox you have one storage directory containing
all mailboxes' mails (so that copying can be done by simple index
updates). Then you have a bunch of files, each about n MB (configurable,
2 MB by default). Expunging initially only marks the message as expunged
in index. Then later (or immediately, configurable) you run a cronjob
that goes through all dboxes and actually removes the used space by
recreating those dbox files.
Yeah, sounds good.
You might consider some kind of "head optimisation", where we can
already assume that the latest chunk of mails will be noisy and have a
mixture of deletes/appends, etc.  Typically mail arrives, gets responded
to, gets deleted quickly, but I would *guess* that if a mail survives
for XX hours in a mailbox then likely it's going to continue to stay
there for quite a long time until some kind of purge event happens (user
goes on a purge, archive task, etc)
Sounds good anyway
Oh, have you considered some "optional" api calls in the storage API?

The logic might be to assume that someone wanted to do something clever
and split the message up in some way, eg store headers separately to
bodies or bodies carved up into mime parts.  The motivation would be if
there was a certain access pattern to optimise.  Eg for an SQL database
it may well be sensible to split headers and the message body in order
to optimise searching - the current API may not take advantage of that?
Ed W