[Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

Mon Sep 28 20:35:08 EEST 2009

Timo Sirainen wrote:
> On Mon, 2009-09-28 at 17:57 +0100, Ed W wrote:
>   
>> My only request to Timo was to kind of consider that a bunch of these 
>> ideas from the audience will almost certainly involve splitting up the 
>> mime message into component parts and that the abstracted interface 
>> should try not to throw away any potential speed benefit that this might 
>> achieve because the interface can't express what it needs clearly enough?
>>     
>
> It might become too complex to initially consider how to support split
> MIME messages and such. I'm not really sure if it even belongs to this
> filesystem abstraction layer. I was hoping that the FS API would be
> really really simple and could also be used for other things than just
> email.
>   

Well, I think if you just implement a wrapper around read(fh, start, 
count) then it's going to be quite hard to implement some kind of 
storage which splits out the message in some way?

I guess the API would need to lineup with the IMAP commands to retrieve 
mime parts.  For the most part these are poorly supported by clients, so 
I guess most mail clients will undo all this cleverness, but I would 
imagine it will have a low impact on performance since it's just extra 
seeks on fetching individual messages?

I am starting to see newer clients finally get this right though.  I'm 
using profimail on my N97 and whilst I didn't look at it's imap stream 
it *seems* to be doing everything right from the client point of view.  
I even get to choose to download all the message if the size is Y and 
ignore larger attachments than Z, etc (In theory Thunderbird does this, 
but at least on my machine it just repeatedly downloads the same message 
again and again in various ways - it grinds to a halt every time I click 
on an email with a decent sized attachment, even if I have already read 
it... grr)

> But I'm also hoping to support things like single-instance storage at
> some point. I'm not really sure if that should just be written into dbox
> code directly or try to abstract it out..
>   

I agree it should at least initially go into the dbox, etc code.  I 
guess if a enough people do the same implementation (in all the new 
backends which I'm sure will arrive within days of some API coming 
out....) it could bubble up, etc?

I would have thought that your API will prefer to request message parts 
where it can (eg header, body, mime part), and just issue a read_bytes, 
where that's what the client is asking for otherwise.  This would allow 
the storage engine to optimise where it can and sadly for the dumb 
client we just stream bytes since that's all they asked for...

Perhaps the API should also request specific headers from the storage 
engine where possible and ask for all headers only where it's 
necessary?  This would allow an sql database to be heavily normalised 
(I'm sure performance is iffy, but we have to pre-suppose some reason 
why this design is useful for other reasons)

Does this seem feasible?

Ed W