paulmon wrote:
My current thinking if having the local delivery break messages up into their component pieces, headers, from address, to address, spam scores, body etc into various key:value relationships.
Whilst this looks appealing on the surface I think the details are going to need some benchmarking to see if they stackup. Certainly I hope this new abstraction works out because I wonder if we won't see a bunch of interesting ideas get implemented, such as you describe!!
Just to knock your theoretical idea around a bit though. My guess would be that you need to look at the access patterns for this data to make sure you don't over normalise it. eg if it's "normal" to simply open up a mailbox and then ask it for every one of the following X fields for the each message, then over normalising the header fields will lead to response time being dominated by access times for each field (especially if that creates a disk seek, etc).
At present I think dovecot's architecture kind of assumes that random access dominates for individual email message and then it optimises for a particular case of header accesses by caching those into a local "database" type structure which "caches" just a certain amount of recently requested header fields. The access times then seem to be bounded by time to scan the inbox for new unseen messages and update this index with maildir (not sure what bounds mailbox scanning times in general use?). ie it's optimising for returning every field X from every message in a folder, or else it is returning bits of a given message?
I should imagine that in general this architecture is near optimal for the general case and the main improvement is just in speeding up the updates after new emails are added/deleted... (done automatically at present if you use deliver, incurs a speed hit if you update yourself)
I should imagine that once you add a requirement to distribute the data and handle failover, etc then the problems of any cache coherency dominate the design and this could be interesting to play with ideas to solve this.
Anyway, I think the point is that for anyone who hasn't tried it yet, to first have a look at how your favourite IMAP client implements imap and watch the stream of commands being issued... It's usually quite a bit different to what you expect and to me it's a lot different to what might be optimal if I got to design their algorithm...
The point being that you shouldn't optimise too much for what you hope people will do, so much as have a look at your favourite webmail client or desktop client and optimise for whatever stream of idiocy they request you to keep pumping at them...
I for one look forward to these changes - I desperately hope I get some time to then play with some ideas because like you I'm itching to play with my "next greatest idea"!!
My only request to Timo was to kind of consider that a bunch of these ideas from the audience will almost certainly involve splitting up the mime message into component parts and that the abstracted interface should try not to throw away any potential speed benefit that this might achieve because the interface can't express what it needs clearly enough?
Good luck
Ed W