Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

28 Sep 2009 · *seems*

      Timo Sirainen wrote:
...
On Mon, 2009-09-28 at 17:57 +0100, Ed W wrote:
...
My only request to Timo was to kind of consider that a bunch of these
ideas from the audience will almost certainly involve splitting up the
mime message into component parts and that the abstracted interface
should try not to throw away any potential speed benefit that this might
achieve because the interface can't express what it needs clearly enough?
It might become too complex to initially consider how to support split
MIME messages and such. I'm not really sure if it even belongs to this
filesystem abstraction layer. I was hoping that the FS API would be
really really simple and could also be used for other things than just
email.
Well, I think if you just implement a wrapper around read(fh, start,
count) then it's going to be quite hard to implement some kind of
storage which splits out the message in some way?
I guess the API would need to lineup with the IMAP commands to retrieve
mime parts.  For the most part these are poorly supported by clients, so
I guess most mail clients will undo all this cleverness, but I would
imagine it will have a low impact on performance since it's just extra
seeks on fetching individual messages?
I am starting to see newer clients finally get this right though.  I'm
using profimail on my N97 and whilst I didn't look at it's imap stream
it *seems* to be doing everything right from the client point of view.

I even get to choose to download all the message if the size is Y and
ignore larger attachments than Z, etc (In theory Thunderbird does this,
but at least on my machine it just repeatedly downloads the same message
again and again in various ways - it grinds to a halt every time I click
on an email with a decent sized attachment, even if I have already read
it... grr)
...
But I'm also hoping to support things like single-instance storage at
some point. I'm not really sure if that should just be written into dbox
code directly or try to abstract it out..
I agree it should at least initially go into the dbox, etc code.  I
guess if a enough people do the same implementation (in all the new
backends which I'm sure will arrive within days of some API coming
out....) it could bubble up, etc?
I would have thought that your API will prefer to request message parts
where it can (eg header, body, mime part), and just issue a read_bytes,
where that's what the client is asking for otherwise.  This would allow
the storage engine to optimise where it can and sadly for the dumb
client we just stream bytes since that's all they asked for...
Perhaps the API should also request specific headers from the storage
engine where possible and ask for all headers only where it's
necessary?  This would allow an sql database to be heavily normalised
(I'm sure performance is iffy, but we have to pre-suppose some reason
why this design is useful for other reasons)
Does this seem feasible?
Ed W