v1.1.0 is finally getting closer, so it's time to start planning what happens after it. Below is a list of some of the larger features I'm planning on implementing. I'm not yet sure in which order, so I think it's time to ask again what features companies would be willing to pay for?
Sponsoring Dovecot development gets you:
- listed in Credits in www.dovecot.org
- listed in AUTHORS file
- you can tell me how you want to use the feature and I'll make sure that it's possible (within reasonable limits)
- allows you to change the order in which features are implemented (more or less)
As you can see below, some features are easier to implement when other features are implemented first. Several of them depend on v2.0 master rewrite, so I'm thinking that maybe it's time to finally finish that first.
I know several companies have requested replication support, but since it kind of depends on several other features, it might be better to leave that for v2.1. But of course sponsoring development of those features that replication depends on gets you the replication support faster. :)
I'm still trying to go to school (and this year I started taking CS classes as well as biotech classes..), so it might take a while for some features to get implemented. I'm anyway hoping for a v1.2 or v2.0 release before next summer.
So, the list..
Implemented, but slightly buggy so not included in v1.1 (yet):
- Mailbox list indexes
- THREAD REFERENCES indexes
- THREAD X-REFERENCES2 algorithm where threads are sorted by their latest message instead of the thread root message
Could be implemented for v1.2:
- Shared mailboxes and ACLs http://www.dovecot.org/list/dovecot/2007-April/021624.html
- Virtual mailboxes:
http://www.dovecot.org/list/dovecot/2007-May/022828.html
- Depends on mailbox list indexes
dbox plans, could be implemented for v1.2:
- Support for single instance attachment storage
- Global dbox storage so copying messages from mailbox to another involves only small metadata updates
- Support some kind of hashed directories for storing message files, instead of storing everything in a single directory.
- Finish support for storing multiple messages in a file
v2.0 master/config rewrite
It's mostly working already. One of the larger problems left is how to handle plugin settings. Currently Dovecot just passes everything in plugin {} section to mail processes without doing any checks. I think plugin settings should be checked just as well as the main settings. Also plugins should be able to create new sections instead of having to use plain key=value settings for everything.
v2.0 currently runs a perl script on Dovecot's source tree to find settiongs from different places then then creates a single all-settings.c for config process. This obviously doesn't work for out-of-tree plugins, so there needs to be some other way for them to pass the allowed settings. There are two possibilities:
a) Read them from separate files. This is a bit annoying, because it's not anymore possible to distribute simple plugin.c files and have them compiled into plugin.so files.
b) Read them from the plugin itself. This would work by loading the plugin and then reading the variable containing the configuration. I'm a bit worried about security implications though, because the plugins could execute some code while they're loaded. But I guess a new unprivileged process could be created for doing this. Of course normally admins won't run any user-requested plugins anyway so this might not be a real issue anyway. :)
Index file optimizations
Since these are not backwards compatible changes, the major version number should be increased whenever these get implemented. So I think these should be combined with v2.0 master/config rewrite. I'd rather not release v3.0 soon after v2.0 just for these..
dovecot.index.log: Make it take less space! Get rid of the current "extension introduction" records that are large and written over and over again. Compress UID ranges using similar methods than Squat uses now. Try all kinds of other methods to reduce space usage.
Write transaction boundaries, so the reader can process entire transactions at a time. This will make some things easier, such as avoiding NFS data cache flushes, replication support and also allows lockless writes using O_APPEND.
- dovecot.index: Instead of keeping a single message record array, split it into two: A compressed list of message UIDs and array containing the rest of the data. This makes the file smaller and keeping UIDs separate allows better CPU cache (and maybe disk I/O) utilization for UID -> sequence lookups. At least this is the theory, I should benchmark this before committing to this design. :)
There could also be a separate expunged UIDs list. UIDs existing in there would be removed from existing UIDs list, but it wouldn't affect calculating offsets to record data in the file. So while currently updating dovecot.index file after expunges requires recreating the whole file, with this method it would be possible to just write 4 bytes to the expunged UIDs list. The space reserved for expunged UIDs list would be pretty small though, so when it gets full the file would be recreated.
- dovecot.index.cache: Better disk I/O and CPU cache utilization by keeping data close together in the file when it's commonly accessed toghether. For example message sizes and dates for all messages could be close to each others to optimize SORTing by message size/date. If clients use conflicting access patterns, the data could be either duplicated to optimize both patterns or just let it be slower for the other pattern.
Deliver / LMTP server
Currently deliver parses dovecot.conf using its own parser. This has caused all kinds of problems. v2.0's master rewrite helps with this, because deliver can then just ask the configuration from config process the same way as other processes are.
Another problem is that people who use multiple user IDs have had to make deliver setuid-root. I really hate this, but currently there's no better way. Better fix for this would be to use LMTP server instead. There could be either a separate LMTP client or deliver could support LMTP protocol as well. I'm not sure which one is better.
How would LMTP server then get around the multiple user IDs issue? There are two possibilities:
a) Run as root and temporarily set effective UID to the user whose mails we are currently delivering. This has the problem that if users have direct access to their mailboxes and index files, the mailbox handling code has to be bulletproof to avoid regular users gaining root privileges by doing special modifications to index files -- possibly at the same time as the mail is being delivered to exploit some race condition with mmaped index files..
b) Create a child process for each different UID and drop its privileges completely. If all users have different UIDs, this usually means forking a new process for each mail delivery, making the performance much worse than with a) method. It would anyway be possible to delay destroying the child processes in case a new message comes in for the same UID. This would probably mostly help only when UIDs are shared by multiple users, such as each domain having their own UID.
I think I'll implement both and let admin choose which method to use.
This same problem exists for several other things besides deliver/LMTP server. For example a) method is already implemented for expire plugin's expire-tool in v1.1. So this code needs to be somewhat generic so it won't have to be duplicated in multiple places.
All of this pretty much requires the new v2.0 master rewrite. Adding these to v1.x would be ugly.
Proxying
These could be implemented to v1.2.
Log in normally (no proxying) if destination IP is the server itself.
Support for per-namespace proxying:
namespace public { prefix = Public/ location = proxy:public.example.org }
There are two choices for this: Dummy IMAP proxying or handling it with a mail storage backend. Dummy IMAP proxying would just let the remote server whose mailbox is currently selected handle most of the commands and their replies. I'm just afraid that this will become problematic in future.
For example if the proxying server wants to send the client an event, such as "new message was added to this other mailbox", it would have to be sure that the remote server isn't currently in the middle of sending something. And the only way to be sure would be to parse its input, which in turn would make the proxying more complex. Then there would be problems if the servers support different IMAP extensions. And maybe some extensions will require knowing the state of multiple mailboxes, which will make handling them even more complex if the mailboxes are on different servers.
Implementing this as a mail storage backend would mean that the proxying server parses all the IMAP commands internally and uses Dovecot's mail storage API to process them. The backend would then request data from the remote server when needed and cache it internally to avoid asking the same data over and over again. Besides caching, this would mean that there are no problems with extensions, since Dovecot handles the the exact same way as for other backends, requiring no additional code.
To get better performance the storage backend would have to be smarter than most other Dovecot backends. For example while the current backends handle searches by reading through all the messages, the proxy backend should just send the search request to the remote server. The same for SORT. And for THREAD, which the current storage API doesn't handle..
And how would proxying server and remote server then communicate? Again two possibilities: Either a standard IMAP protocol, with possibly some optional Dovecot-extensions to improve performance, or a bandwidth-efficient protocol. The main advantage with IMAP protocol is that it could be used for proxying to other servers. Bandwidth-efficient protocol could be based on dovecot.index.log format, making it faster/easier to parse and generate.
Once we're that far with proxying, we're close to having..:
Replication
The plans haven't changed much from: http://www.dovecot.org/list/dovecot/2007-May/022792.html
The roadmap to complete replication support would be basically:
Implement proxying with mail storage API.
Implement asynchronous replication:
- Replication master process collecting changes from other Dovecot processes via UNIX sockets. It would forward these changes to slaves. Note that this requires that only Dovecot is modifying mailboxes, mails must be delivered with Dovecot's deliver!
- If some slaves are down, save the changes to files in case they come back up. In case they don't and we reach some specific max. log file size, keep also track of mailboxes that have changed and once the slaves do come back, make sure those mailboxes are synced.
- If replication master is down and there have been changes to mailboxes.. Well, I haven't really thought about this. Maybe refuse to do any changes to mailboxes? Maybe have mail processes write to some "mailboxes changed" log file directly? Probably not that big of a deal since replication master just shouldn't be down. :)
- Support for manual resync of all mailboxes, in case something was done outside Dovecot.
- Replication slave processes would be reading incoming changes. I think this process should just make sure that the changes are read as fast as possible, so with sudden bursts of activity the slave would start saving the changes to a log instead of the master.
- Replication mailbox writer processes would receive changes from the main slave process via UNIX sockets. They'd be responsible for actually saving the data to mailboxes. If multiple user IDs are used, this has the same problems as LMTP server.
- Conflict resolution. If mailbox writer process notices that the UID already exists in the mailbox, both the old and the new message must be given new UIDs. This involves telling the replication master about the problem, so both servers can figure out together the new UIDs. I haven't thought this through yet.
- Conflict resolution for message flags/keywords. The protocol would normally send flag changes incrementally, such as "add flag1, remove flag2". Besides that there could be a 8bit checksom of all the resulting flags/keywords. If the checksum doesn't match, request all the current flags. It would be of course possible to send the current flags always instead of incremental updates, but this would waste space when there are lots of keywords, and multi-master replication will work better when the updates are incremental.
- Synchronous replication:
- When saving messages, don't report success until at least one replication slave has acknowledged that it has also saved the message. Requires adding bidirectional communication with the replication master process. The message could sent to slaves in smaller parts, so if the message is large the wait at the end would basically consist of sending "finished saving UID 1234, please ack".
- Synchronous expunges also? Maybe optional. The main reason for this is that it wouldn't be nice to break IMAP state by having expunged messages come back.
- Multi-master replication:
Almost everything is already in place. The main problem here is IMAP UID allocation. They must be allocated incrementally, so each mailbox must have a globally increasing UID counter. The usage goes like:
- Send the entire message body using a global unique ID to all servers. No need to wait for their acknowledgement.
- Increase the global IMAP UID counter for this mailbox.
- Send global UID => IMAP UID message to all servers.
- Wait for one of them to reply with "OK".
Because messages sent by different servers can end up being visible in different times, the servers must wait that they've seen all the previous UIDs before notifying client of a new UID. If the sending server crashed between steps 2 and 3, this never happens. So the replication processes must be able to handle that, and also handle requesting missing UIDs in case they had lost connections to other servers.
I just had the idea of global UID counters instead of global locks, so I haven't yet thought how the counters would be best implemented. It should be easier to implement than global locks though.
Non-incremental changes are somewhat problematic and could actually require global locks. The base IMAP protocol supports replacing flags, so if there is no locking and conflicting flag replacement commands are sent to two servers, which one of those commands should be effective? Timestamps would probably be enough for flag changes, but it may not be enough for some IMAP extensions (like CONDSTORE?).
If global locks are required, they would work by passing the lock from one server to another when requested. So as long as only a single server is modifying the mailbox, it would hold the global lock and there would be no need to keep requesting it for each operation.
- Automation:
- Automatically move/replicate users between servers when needed. This will probably be implemented separately long after 4.
The replication processes would be ugly to implement for v1.x master, so this pretty much depends on v2.0 master rewrite.