On Thu, 2009-02-12 at 11:29 +0100, Mikkel wrote:
Hi Timo
I have a few comments. Please just disregard them if I have misunderstood your design.
Regarding your storage plan I find it very important that users can be stored in different locations because:
This you misunderstood. The mails of a single user are stored in one dbox directory, not all users.
Regarding 7. I very much for all the self healing you describe. There is nothing worse than huge complex systems that fail just because of some minor error that could easily be fixed without manual intervention. But also I'm a little worried in this regard.
Maildir is so robust that nothing can really go wrong.
Yes. If you don't care that much about performance Maildir is going to be more reliable, especially when recovering from filesystem corruption.
It should be very resilient to temporarily losing access to all files in this operation (could happen very often on NFS mounts).
I/O errors and such are treated differently than corrupted/missing files. So as long as reading gives an error it doesn't try to repair anything.
Also I imagine the self-healing going into loops if it doesn't understand what’s going on. If the data changes dues to manual intervention or par of the file system can be accessed you could imagine the self healing process trying again and again to fix something that isn't its job to fix. In that case it would be better if it just skipped the apparent failures.
I'm not really sure what you're thinking about here. Assuming there aren't bugs in the fixup code, it should be able to fix things. If someone manually goes and breaks things again, then sure it fixes them again later, but there's really no automatic looping. Also Dovecot already does index file fixing if it notices corruption, so this won't be all that much different.
If there is serious data corruption and you have only one file then all operations are paused while the self healing is trying to figure out what went wrong
There will be multiple files even per user, but yes, if corruption is noticed then the user is blocked until the corruption is fixed.
(and what happens if different servers decide to do self-healing on this one file at the same time?).
The same as if two processes in one server decide to self-heal: Locking prevents it from happening.