Timo Sirainen wrote:
On Mon, 2009-09-28 at 09:00 -0700, paulmon wrote:
My current thinking is a key/value store as you've proposed. Something like Hadoop components or Project Voldamort. Voldamort might be a better fit from what I've read.
My understanding of Hadoop is that it's more about distributed computing instead of storage.
I believe it's possible to use it to ask lots of machines to parse a bit of database and then get the answer back from all of them. eg some people are alleged to be using it to parse huge log files in sensible time by splitting up their log files across lots of machines and asking each of them to do a bit of filtering...
I'm out of my depth at this point - only read the executive summary...
My current thinking if having the local delivery break messages up into their component pieces, headers, from address, to address, spam scores, body etc into various key:value relationships.
I was planning on basically just storing key=username/message-guid, value=message pairs instead of splitting it up. Or perhaps split header and body, but I think piecing it smaller than those just makes the performance worse. To get different headers quickly there would still be dovecot.index.cache (which would be in a some quick in-memory storage but also stored in the database).
This can presumably be rephrased as:
- access times are say 10ms
- linear read times are say 60MB/sec
- Therefore don't break up a message into more than 0.010s * 60MB = 600KB (ish) chunks or your seek times dominate simply doing linear reads and throwing away what you don't need...
- Obviously insert whatever timings you like and re-run the numbers, eg if you have some fancy pants flash drive then insert shorter seek times
However, these numbers and some very limited knowledge of how a small bunch of email clients seem to behave would suggest that the following is also worth optimising to varying degrees (please don't overlook someone wanting to implement some backend to try these ideas):
Theory: Attachments larger than K are worth breaking out according to
the formula above
Justification: Above actually a fairly small attachment size it's
cheaper to do a seek than to linear scan to the next mail message. For
some storage designs this might be helpful (mbox type packing).
Additionally some users have suggested that they want to try and single
instance popular attachments, so "K" might be customisable, or better
yet some design might choose to keep a cache of attachment fingerprints
and de-dup them when a dup is next seen..
Theory: breakout (all) headers from bodies Justification: scanning headers seems a popular task and dovecot keeps a local database to optimise the common case. Reseeks would be slow though and some storage designs might be able to optimise and get fast linear seeks across all headers (eg pack them as per mbox and compress them?)
Theory: breakout individual headers Justification: err... not got a good case for this one, but some of these fancy key value databases are optimised for creating views on certain headers across certain messages. I imagine this won't fly in practice, but it seems a shame not to try it... Definitely anyone implementing an SQL database option will want to try it though... (bet it's slow though...)
Theory: pack message bodies together as per mbox
Justification: mbox seems faster, compresses better and all round seems
better than maildir for access speed, except in certain circumstances
such as deletes. Dovecot already seems to optimise some corner cases by
just marking messages dead without deleting them, so clearly there is
tremendous scope for improvement here (dbox going down this route?).
Some bright spark might design some backend which uses multiple mbox
files to overcome the huge hit when "defragging" and it may well be that
by incorporating eg splitting out larger attachments, and lightly
compressing, then some workloads might see some really good performance!
(Could be really interesting for archive mailboxes, etc?)
Just my 2p...
Ed