On Sat, 2007-12-08 at 10:56 +0000, Ed W wrote:
dbox plans, could be implemented for v1.2:
Interesting to see this as production ready product. Could be very interesting if it adds performance and new features.
It should be production ready in v1.1 already. Or at least I don't see any bugs in my stress tests. And it is somewhat faster than maildir :) A simple stress test shows:
./imaptest secs=10 seed=0
Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo
100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 100%
30% 5%
maildir:
3506 1814 1770 3506 3501 5034 1316 2830 3501 3506 7012
dbox:
4027 2032 2022 4026 4023 5819 1541 3197 4023 4025 8056
Other tests will probably show even larger difference.
v2.0 master/config rewrite
It's mostly working already. One of the larger problems left is how to handle plugin settings. Currently Dovecot just passes everything in plugin {} section to mail processes without doing any checks.
Isn't one of the simplest things still to build a "hash" or "dictionary" (pick your term) of all the things in the config file and then let the relevant chunk of code figure out what to do with it?
The design is currently that there's a config process that is responsible for reading and parsing the configuration into simple key=value form. Other processes (such as imap) will ask the configuration from the config process via UNIX socket, and get the simple key=value pairs.
Those key=value pairs is how Dovecot v1.0 currently handles the configuration. It just gets them from environment. To make it even easier I wanted to deserialize them directly to different settings structures. This works by everyone calling some settings_register() type of function before the settings are actually read.
So the valid settings are verified by the config process. It needs to know about all the valid settings, otherwise errors won't be reported early enough. The key=value pairs can contain unknown keys without anything complaining about it, so for example if imap process gets settings to a plugin it hasn't loaded, those settings just get ignored.
If config is shared then it might be nice to break out the config parser into a separate lib so that it's easily shared around - in particular default configs
The key=value deserializer is in lib-settings in the sources. The config process will probably support reading settings from different kinds of sources. By default it will read v1.0-like dovecot.conf, but it could just as well read the configuration from SQL (either built-in or maybe plugin, I'm not sure yet).
What about it the default configs (and hence config structure) are provided with each plugin in a simple text file? Main app has a way to scoop all these together if someone needs a base template config file?
I thought about this too, but it makes it more difficult to install new plugins and it's easier to make mistakes. The less things there are to remember the better. :)
Replication
I'm really up for this. However, your plans all seem to involve online sync - I need a very low bandwidth dialup sync...
The plans are mainly about online synchronization, but I think a lot of the same infrastructure can be used for low bandwidth sync as well.
Much more interested in features like
- only syncing certain mailboxes, or prioritising the order at least
This shouldn't be too difficult. And I think some of this could be done automatically too. Like prioritize syncing those mailboxes first that the user is currently accessing.
- Perhaps only syncing the main mime structure and the delayed attachments sync (really nice if could be done cleanly!!)
This would be a bit tricky, although it's somewhat related to dbox's single instance attachment storage. Once that's working it shouldn't be too difficult to delay replicating attachments.
You mention some problems like unique ID's. This is actually pretty simple - just use some GUID type process. All you need is something guaranteed unique on each server, combinations of time, something unique per server and some randomness get you there. Discount anything which is some kind of counter because it will break eventually.
It will have global UIDs which it uses internally. I hadn't thought much yet how those would be generated, but they're not the problem. The problem is IMAP protocol where it's not possible to use them. IMAP requires using a growing 32bit integer as the UID. There's no way around that, so IMAP UID conflicts can happen and they have to be handled somehow.
The IMAP UID conflicts are detected by a message having the same IMAP UID but different global UIDs. So the replication then just needs to make a new global UID -> IMAP UID mapping for both of the messages to fix the problem.
- Conflict resolution.
It's better that you get a duplicated message than something is lost.
Yes, that's the plan. Although the duplication still should happen only rarely. :)
- Synchronous expunges also? Maybe optional. The main reason for this is that it wouldn't be nice to break IMAP state by having expunged messages come back.
Expunges need to be async and tracked. Very likely we will see people delete messages from two servers (because they moved server and didn't see the delete sync across yet.
The synchronous operation mode would be optional. You probably wouldn't want to enable it for low bandwidth links anyway. Even for multi-master replication it's not a requirement to have the servers connected to each others. It just prevents the possibility of IMAP UID conflicts.
Also need to watch for clients copying up deleted messages...?
What do you mean by this?
- Automation:
- Automatically move/replicate users between servers when needed. This will probably be implemented separately long after 4.
Don't see why this is a lower priority? Can I suggest that you *start* with this and go back to do the normal replication afterwards? .. Actually this same code seems more likely to develop into a more general demon:
- Adjust filters per user
- Adjust permissions on shared folders
- Other admin operations per mailbox (forced purge, cleanup Trash, etc)
The main reason why I don't want to do anything about this yet is that it would be difficult to make it work with different kinds of configurations. I don't yet want to create some "Dovecot admin" tool that requires you to have set up your filesystem in a specific way or use a specific database for storing user configuration.
Actually I think if such a tool is created it should be a completely separate package from Dovecot. It most likely will expand into a generic mail server administrator tool, similar to PostfixAdmin etc. And since developing that tool doesn't require knowing any Dovecot internals, it could just as well be developed by pretty much anyone. :)
The replication processes would be ugly to implement for v1.x master, so this pretty much depends on v2.0 master rewrite.
OK, but rewrites are bad, incremental changes are always preferred...
I know. It's not that big of a rewrite though. And I am thinking about doing this incrementally. First move over the settings deserializer and after that's known to work replace the master/log/config backend (which can't really be done incrementally in any easy way). The rewrite will then be only for about 5% of Dovecot's code :)
Can I also add to your TODO list:
- Lemonade Profile!
Yes, and other extensions.. I haven't thought about these much yet, but I think most of them won't be too difficult to implement.
- Enhancements to the Expire plugin making it easier to enforce a quota, eg trimming Trash initially, then Sent Items, then old stuff in other folders, etc, etc
Sounds exactly what trash plugin does: http://wiki.dovecot.org/Plugins/Trash