[Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

Mon Sep 28 19:57:42 EEST 2009

paulmon wrote:
> My current thinking if having the local delivery break messages up into
> their component pieces, headers, from address, to address, spam scores, body
> etc into various key:value relationships.

Whilst this looks appealing on the surface I think the details are going 
to need some benchmarking to see if they stackup.  Certainly I hope this 
new abstraction works out because I wonder if we won't see a bunch of 
interesting ideas get implemented, such as you describe!!

Just to knock your theoretical idea around a bit though.  My guess would 
be that you need to look at the access patterns for this data to make 
sure you don't over normalise it.  eg if it's "normal" to simply open up 
a mailbox and then ask it for every one of the following X fields for 
the each message, then over normalising the header fields will lead to 
response time being dominated by access times for each field (especially 
if that creates a disk seek, etc).

At present I think dovecot's architecture kind of assumes that random 
access dominates for individual email message and then it optimises for 
a particular case of header accesses by caching those into a local 
"database" type structure which "caches" just a certain amount of 
recently requested header fields.  The access times then seem to be 
bounded by time to scan the inbox for new unseen messages and update 
this index with maildir (not sure what bounds mailbox scanning times in 
general use?).  ie it's optimising for returning every field X from 
every message in a folder, or else it is returning bits of a given message?

I should imagine that in general this architecture is near optimal for 
the general case and the main improvement is just in speeding up the 
updates after new emails are added/deleted... (done automatically at 
present if you use deliver, incurs a speed hit if you update yourself)

I should imagine that once you add a requirement to distribute the data 
and handle failover, etc then the problems of any cache coherency 
dominate the design and this could be interesting to play with ideas to 
solve this.

Anyway, I think the point is that for anyone who hasn't tried it yet, to 
first have a look at how your favourite IMAP client implements imap and 
watch the stream of commands being issued... It's usually quite a bit 
different to what you expect and to me it's a lot different to what 
might be optimal if I got to design their algorithm...

The point being that you shouldn't optimise too much for what you hope 
people will do, so much as have a look at your favourite webmail client 
or desktop client and optimise for whatever stream of idiocy they 
request you to keep pumping at them...

I for one look forward to these changes - I desperately hope I get some 
time to then play with some ideas because like you I'm itching to play 
with my "next greatest idea"!!

My only request to Timo was to kind of consider that a bunch of these 
ideas from the audience will almost certainly involve splitting up the 
mime message into component parts and that the abstracted interface 
should try not to throw away any potential speed benefit that this might 
achieve because the interface can't express what it needs clearly enough?

Good luck

Ed W