Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

28 Sep 2009


      paulmon wrote:
...
My current thinking if having the local delivery break messages up into
their component pieces, headers, from address, to address, spam scores, body
etc into various key:value relationships.
Whilst this looks appealing on the surface I think the details are going
to need some benchmarking to see if they stackup.  Certainly I hope this
new abstraction works out because I wonder if we won't see a bunch of
interesting ideas get implemented, such as you describe!!
Just to knock your theoretical idea around a bit though.  My guess would
be that you need to look at the access patterns for this data to make
sure you don't over normalise it.  eg if it's "normal" to simply open up
a mailbox and then ask it for every one of the following X fields for
the each message, then over normalising the header fields will lead to
response time being dominated by access times for each field (especially
if that creates a disk seek, etc).
At present I think dovecot's architecture kind of assumes that random
access dominates for individual email message and then it optimises for
a particular case of header accesses by caching those into a local
"database" type structure which "caches" just a certain amount of
recently requested header fields.  The access times then seem to be
bounded by time to scan the inbox for new unseen messages and update
this index with maildir (not sure what bounds mailbox scanning times in
general use?).  ie it's optimising for returning every field X from
every message in a folder, or else it is returning bits of a given message?
I should imagine that in general this architecture is near optimal for
the general case and the main improvement is just in speeding up the
updates after new emails are added/deleted... (done automatically at
present if you use deliver, incurs a speed hit if you update yourself)
I should imagine that once you add a requirement to distribute the data
and handle failover, etc then the problems of any cache coherency
dominate the design and this could be interesting to play with ideas to
solve this.
Anyway, I think the point is that for anyone who hasn't tried it yet, to
first have a look at how your favourite IMAP client implements imap and
watch the stream of commands being issued... It's usually quite a bit
different to what you expect and to me it's a lot different to what
might be optimal if I got to design their algorithm...
The point being that you shouldn't optimise too much for what you hope
people will do, so much as have a look at your favourite webmail client
or desktop client and optimise for whatever stream of idiocy they
request you to keep pumping at them...
I for one look forward to these changes - I desperately hope I get some
time to then play with some ideas because like you I'm itching to play
with my "next greatest idea"!!
My only request to Timo was to kind of consider that a bunch of these
ideas from the audience will almost certainly involve splitting up the
mime message into component parts and that the abstracted interface
should try not to throw away any potential speed benefit that this might
achieve because the interface can't express what it needs clearly enough?
Good luck
Ed W