Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

28 Sep 2009

      Timo Sirainen wrote:
...
On Mon, 2009-09-28 at 09:00 -0700, paulmon wrote:
...
My current thinking is a key/value store as you've proposed.  Something like
Hadoop components or Project Voldamort.  Voldamort might be a better fit
from what I've read.
My understanding of Hadoop is that it's more about distributed computing
instead of storage.
I believe it's possible to use it to ask lots of machines to parse a bit
of database and then get the answer back from all of them.  eg some
people are alleged to be using it to parse huge log files in sensible
time by splitting up their log files across lots of machines and asking
each of them to do a bit of filtering...
I'm out of my depth at this point - only read the executive summary...
...
...
My current thinking if having the local delivery break messages up into
their component pieces, headers, from address, to address, spam scores, body
etc into various key:value relationships.
I was planning on basically just storing key=username/message-guid,
value=message pairs instead of splitting it up. Or perhaps split header
and body, but I think piecing it smaller than those just makes the
performance worse. To get different headers quickly there would still be
dovecot.index.cache (which would be in a some quick in-memory storage
but also stored in the database).
This can presumably be rephrased as:

access times are say 10ms
linear read times are say 60MB/sec
Therefore don't break up a message into more than 0.010s * 60MB =
600KB (ish) chunks or your seek times dominate simply doing linear reads
and throwing away what you don't need...
Obviously insert whatever timings you like and re-run the numbers, eg
if you have some fancy pants flash drive then insert shorter seek times

However, these numbers and some very limited knowledge of how a small
bunch of email clients seem to behave would suggest that the following
is also worth optimising to varying degrees (please don't overlook
someone wanting to implement some backend to try these ideas):
Theory: Attachments larger than K are worth breaking out according to
the formula above
Justification: Above actually a fairly small attachment size it's
cheaper to do a seek than to linear scan to the next mail message. For
some storage designs this might be helpful (mbox type packing).

Additionally some users have suggested that they want to try and single
instance popular attachments, so "K" might be customisable, or better
yet some design might choose to keep a cache of attachment fingerprints
and de-dup them when a dup is next seen..
Theory: breakout (all) headers from bodies
Justification: scanning headers seems a popular task and dovecot keeps a
local database to optimise the common case.  Reseeks would be slow
though and some storage designs might be able to optimise and get fast
linear seeks across all headers (eg pack them as per mbox and compress
them?)
Theory: breakout individual headers
Justification: err... not got a good case for this one, but some of
these fancy key value databases are optimised for creating views on
certain headers across certain messages.  I imagine this won't fly in
practice, but it seems a shame not to try it... Definitely anyone
implementing an SQL database option will want to try it though... (bet
it's slow though...)
Theory: pack message bodies together as per mbox
Justification: mbox seems faster, compresses better and all round seems
better than maildir for access speed, except in certain circumstances
such as deletes.  Dovecot already seems to optimise some corner cases by
just marking messages dead without deleting them, so clearly there is
tremendous scope for improvement here (dbox going down this route?).

Some bright spark might design some backend which uses multiple mbox
files to overcome the huge hit when "defragging" and it may well be that
by incorporating eg splitting out larger attachments, and lightly
compressing, then some workloads might see some really good performance!
(Could be really interesting for archive mailboxes, etc?)
Just my 2p...
Ed

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

Ed W