Hi
- Mail data on the other hand is just written once and usually read maybe once or a couple of times. Caching mail data in memory probably doesn't help all that much. Latency isn't such a horrible issue as long as multiple mails can be fetched at once / in parallel, so there's only a single latency wait.
This logically seems correct. Couple of questions then:
Since latency requirements are low, why did performance drop so much previously when you implemented a very simple mysql storage backend? I glanced at the code a few weeks ago and whilst it's surprisingly complicated right now to implement a backend, I was also surprised that a database storage engine "sucked" I think you phrased it? Possibly the code also placed the indexes on the DB? Certainly this could very well kill performance? (Note I'm not arguing sql storage is a good thing, I just want to understand the latency to backend requirements)
I would be thinking that with some care, even very high latency storage would be workable, eg S3/Gluster/MogileFs ? I would love to see a backend using S3 - If nothing else I think it would quickly highlight all the bottlenecks in any design...
- Implement a multi-master filesystem backend for index files. The idea would be that all servers accessing the same mailbox must be talking to each others via network and every time something is changed, push the change to other servers. This is actually very similar to my previous multi-master plan. One of the servers accessing the mailbox would still act as a master and handle conflict resolution and writing indexes to disk more or less often.
Take a look at Mogilefs for some ideas here. I doubt it's a great fit, but they certainly need to solve a lot of the same problems
- Implement filesystem backend for dbox and permanent index storage using some scalable distributed database, such as maybe Cassandra.
CouchDB? It is just the Lotus Notes database after all, and personally I have built some *amazing* applications using that as the backend. (I just love the concept of Notes - the gui is another matter...)
Note that CouchDB is interesting in that it is multi-master with "eventual" synchronisation. This potentially has some interesting issues/benefits for offline use
For the filesystem backend have you looked at the various log structured filesystems appearing? Whenever I watch the debate between Maildir vs Mailbox I always think that a hybrid is the best solution because you are optimising for a write one, read many situation, where you have an increased probability of having good cache localisation on any given read.
To me this ends up looking like log structured storage... (which feels like a hybrid between maildir/mailbox)
- Scalability, of course. It'll be as scalable as the distributed database being used to store mails.
I would be very interested to see a kind of "where the time goes" benchmark of dovecot. Have you measured and found that latency of this part accounts for x% of the response time and CPU bound here is another y%, etc? eg if you deliberately introduce X ms of latency in the index lookups, what does that do to the response time of the system once the cache warms up? What about if the response time to the storage backend changes? I would have thought this would help you determine how to scale this thing?
All in all sounds very interesting. However, couple of thoughts:
What is the goal?
If the goal is performance by allowing a scale-out in quantity of servers then I guess you need to measure it carefully to make sure it actually works? I haven't had the fortune to develop something that big, but the general advice is that scaling out is hard to get right, so assume you made a mistake in your design somewhere... Measure, measure
If the goal is reliability then I guess it's prudent to assume that somehow all servers will get out of sync (eventually). It's definitely nice if they are self repairing as a design goal, eg the difference between a full sync and shipping logs (I ship logs to have a master-master mysql server, but if we have a crash then I use a sync program (maatkit) to check the two servers are in sync and avoid recreating one of the servers from fresh)
If the goal is increased storage capacity on commodity hardware then it needs a useful bunch of tools to manage the replication and make sure there is redundancy and it's easy to find the required storage. I guess look at Mogilefs, if you think you can do better then at least remember it was quite hard work to get to that stage, so doing it again is likely to be non trivial?
If the goal were making it simpler to build a backend storage engine then this would be excellent - I find myself wanting to benchmark ideas like S3 or sticking things in a database, but I looked at the API recently and it's going to require a bit of investment to get started - certainly more than a couple of evenings poking around... Hopefully others would write interesting backends, regardless of whether it's sensible to use them on high performance setups, some folks simply want/need to do unusual things...
Finally I am a bit sad that offline distributed multi-master isn't in the roadmap anymore... :-( - My situation is we have a lot of boats boating around with intermittent expensive satellite connections and the users are fluid and need to get access to their data from land and different vessels. Currently we build software inhouse to make this possible, but it would be fantastic to see more features enabling this on the server side (CouchDB / Lotus Notes is cool...)
Good luck - sounds fun implementing all this anyway!
Ed W