[Dovecot] Dovecot v2.2 plans
Here's a list of things I've been thinking about implementing for Dovecot v2.2. Probably not all of them will make it, but I'm at least interested in working on these if I have time.
Previously I've mostly been working on things that different companies were paying me to work on. This is the first time I have my own company, but the prioritization still works pretty much the same way:
- priority: If your company is highly interested in getting something implemented, we can do it as a project via my company. This guarantees that you'll get the feature implemented in a way that integrates well into your system.
- priority: Companies who have bought Dovecot support contract can let me know what they're interested in getting implemented. It's not a guarantee that it gets implemented, but it does affect my priorities. :)
- priority: Things other people want to get implemented.
There are also a lot of other things I have to spend my time on, which are before the 2. priority above. I guess we'll see how things work out.
dsync-based replication
I'll write a separate post about this later. Besides, it's coming for Dovecot v2.1 so it's a bit off topic, but I thought I'd mention it anyway.
Shared mailbox improvements
Support for private flags for all mailbox formats:
namespace { type = public prefix = Public/ mail_location = mdbox:/var/vmail/public:PVTINDEX=~/mdbox/indexes-public }
- dsync needs to be able to replicate the private flags as well as shared flags.
- might as well add a common way for all mailbox formats to specify which flags are shared and which aren't. $controldir/dovecot-flags would say which is the default (private or shared) and what flags/keywords are the opposite.
- easy way to configure shared mailboxes to be accessed via imapc backend, which would allow easy shared mailbox accesses across servers or simply between two system users in same server. (this may be tricky to dsync.)
- global ACLs read from a single file supporting wildcards, instead of multiple different files
- default ACLs for each namespace/storage root (maybe implemented using the above..)
Metadata / annotations
Add support for server, mailbox and mail annotations. These need to be dsyncable, so their changes need to be stored in various .log files:
Per-server metadata. This is similar to subscriptions: Add changes to dovecot.mailbox.log file, with each entry name a hash of the metadata key that was changed.
Per-mailbox metadata. Changes to this belong inside mailbox_transaction_context, which write the changes to mailbox's dovecot.index.log files. Each log record contains a list of changed annotation keys. This gives each change a modseq, and also allows easily finding out what changes other clients have done, so if a client has done ENABLE METADATA Dovecot can easily push metadata changes to client by only reading the dovecot.index.log file.
Per-mail metadata. This is pretty much equivalent to per-mailbox metadata, except changes are associated to specific message UIDs.
The permanent storage is in dict. The dict keys have components:
- priv/ vs. shared/ for specifying private vs. shared metadata
- server/ vs mailbox/<mailbox guid>/ vs. mail/<mailbox guid>/<uid>
- the metadata key name
This would be a good time to improve the dict configuration to allow things like:
- mixed backends for different hierarchies (e.g. priv/mailbox/* goes to a file, while the rest goes to sql)
- allow sql dict to be used in more relational way, so that mail annotations could be stored with tables: mailbox (id, guid) and mail_annotation (mailbox_id, key, value), i.e. avoid duplicating the guid everywhere.
Things to think through:
- How to handle quota? Probably needs to be different from regular mail quota. Probably some per-user "metadata quota bytes" counter/limit.
- Dict lookups should be done asynchronously and prefetched as much as possible. For per-mail annotation lookups mail_alloc() needs to include a list of annotations that are wanted.
Configuration
Copy all mail settings to namespaces, so it'll be possible to use per-namespace mailbox settings. Especially important for imapc_* settings, but can be useful for others as well. Those settings that aren't explicitly defined in the namespace will use the global defaults. (Should doveconf -a show all of these values, or simply the explicitly set values?)
Get rid of *.conf.ext files. Make everything part of dovecot.conf, so doveconf -n outputs ALL of the configuration. There are mainly 3 config files I'm thinking about: dict-sql, passdb/userdb sql, passdb/userdb ldap. The dict-sql is something I think needs a bigger redesign (mentioned above in "Metadata" section), but the sql/ldap auth configs could be merged. One way could be:
sql_db sqlmails { # most settings from dovecot-sql.conf.ext, except for queries driver = mysql connect = ... }
ldap_db ldapmails { # most settings from dovecot-ldap.conf.ext, except attributes/filters }
passdb { driver = sql db = sqlmails sql_query = select password from users where username = '%u' } passdb { driver = ldap db = ldapmails ldap_attributes { password = %{ldap:userPassword} } ldap_filter = ... }
The sql_db {} and ldap_db {} would be generic enough to be used everywhere (e.g. dict-sql), not just for passdb/userdb.
Some problems:
- Similar to the per-namespace mail settings, doveconf -a would output all sql_query, ldap_attributes, ldap_filter, etc. settings for all passdbs/userdbs. Perhaps a similar solution?
- The database configs contain passwords, so they should be readable only by root. This makes running dovecot-lda and maybe doveadm difficult, since they fail at "permission denied" when trying to open the config. There are probably only two solutions: a) The db configs need to be !include_try'd or b) the configs can be world-readable, but only passwords are placed to only-root-readable files by using "password =
IMAP state saving/restoring
IMAP connections are often long running. Problems with this:
- Currently each connection requires a separate process (at least to work reliably), which means each connection also uses quite a lot of memory even when they aren't doing anything for a long time.
- Some clients don't handle lost connections very nicely. So Dovecot can't be upgraded without causing some user annoyance. Also in a cluster if you want to bring down one server, the connections have to be disconnected before they can be moved to another server.
If IMAP session state could be reliably saved and later restored to another process, both of the above problems could be avoided entirely. Typically when a connection is IDLEing there are really just 4 things that need to be remembered: username, selected mailbox name, its UIDVALIDITY and HIGHESTMODSEQ. With this information the IMAP session can be fully restored in another process without losing any state. So, what we could do is:
- When an IMAP connection has bee IDLEing for a while (configurable initial time, could be dynamically adjusted):
- move the IMAP state and the connection fd to imap-idle process
- the old imap process is destroyed
- imap-idle process can handle lots of IMAP connections
- imap-idle process also uses inotify/etc. to watch for changes in the specified mailbox
- if any mailbox changes happen or IMAP client sends a command, start up a new imap process, restore the state and continue from where we left off
- This could save quite a lot of memory at the expense of some CPU usage
Dovecot proxy <-> backend protocol could be improved to support moving connection to another backend. Possibly using a separate control connection to avoid making the proxying less efficient in normal operation.
When restarting Dovecot, move all the connections to a process that keeps the connections open for a while. When Dovecot starts up, create imap processes back to the connections. This allows changing configuration for existing client connections (which sometimes may be bad! need to add checks against client-visible config conflicts), upgrading Dovecot, etc. without being visible to clients. The only problem is SSL connections: OpenSSL doens't provide a way to save/restore state, so either you need to set shutdown_clients=no (and possibly keep some imap-login processes doing SSL proxying for a long time), or SSL connections need to be killed. Of course the SSL handling could be outsourced to some other software/hardware outside Dovecot.
The IMAP state saving isn't always easy. Initially it could be implemented only for the simple cases (which are a majority) and later extended to cover more.
IMAP extensions
- CATENATE is already implemented by Stephan
- URLAUTH is also planned to be implemented, somewhat differently than in Apple's patch. The idea is to create a separate imap-urlauth service that provides extra security.
- NOTIFY extension could be implemented efficiently using mailbox list indexes, which already exists in v2.1.
- FILTERS extension can be easily implemented once METADATA is implemented
- There are also other missing extensions, but they're probably less important: BINARY & URLAUTH=BINARY, CONVERT, CONTEXT=SORT, CREATE-SPECIAL-USE, MULTISEARCH, UTF8=* and some i18n stuff.
Backups
Filesystem based backups have worked well enough with Dovecot in the past. But with new features like single instance storage it's becoming more difficult. There's no 100% consistent way to even get filesystem level backups with SIS enabled, because deleting both the message file and its attachment files can't be done atomically (although usually this isn't a real problem). Restoring SIS mails is more difficult though, first you need to restore the dbox mail files and then you need to figure out what attachment files from SIS need to be restored, and finally you'll need to do doveadm import to put them into their final destination.
I don't have much experience with backup software, but other people in my company do. The initial idea is to implement a Dovecot backup agent to one (commercial) backup software, which allows doing online backups and restoring mails one user/mailbox/mail at a time. I don't know the details yet how exactly this is going to be implemented, but the basic plan is probably to implement a "backup" mail storage backend, which is a PostgreSQL pg_dump-like flat file containing mails from all mailboxes. doveadm backup/import can then export/import this format via stdout/stdin. Incremental backups could possibly be done by giving a timestamp of previous backup run (I'm not sure about this yet).
Once I've managed to implement the first fully functional backup agent, it should become clearer how to implement it to other backup solutions.
Random things
- dovecot.index.cache file writing is too complex, should be simplified
- Enable auth_debug[_passwords]=yes on-the-fly for some specific users/IPs via doveadm
- Optimize virtual mailboxes using mailbox list indexes. It wouldn't anymore need to keep all the backend mailboxes' index files open.
- Would be nice to go forward with supporting key-value databases as mail storage backends.
On 13.2.2012, at 17.35, forumer@smartmobili.com wrote:
Is there any plan to port dovecot to windows ?
It probably works via Cygwin (although I think Cygwin has to do some kind of POSIX filesystem emulation, which slows things down and might cause some trouble if server crashes).
Actually we have discussed this a little within my company.. I'm personally not interested in spending much time on it, but that's why we're hiring more coders so I won't have to do everything. :) If there is enough commercial interest, we might build something better than using Cygwin. Anyway, one thing is certain: No Dovecot for Windows questions in this mailing list. Perhaps I'll create another dovecot-windows@ mailing list. Perhaps I'll even give it a different name (dovedows? wincot? glassdove?)
Am 13.02.2012 12:47, schrieb Timo Sirainen:
Get rid of *.conf.ext files. Make everything part of dovecot.conf, so doveconf -n outputs ALL of the configuration
might a question of taste, but i never liked the splitted up config style, so i like this idea
-- Best Regards
MfG Robert Schetterer
Germany/Munich/Bavaria
On 13.2.2012, at 17.51, Robert Schetterer wrote:
Am 13.02.2012 12:47, schrieb Timo Sirainen:
Get rid of *.conf.ext files. Make everything part of dovecot.conf, so doveconf -n outputs ALL of the configuration
might a question of taste, but i never liked the splitted up config style, so i like this idea
Note that I said *.conf.ext, not *.conf..
Here's a list of things I've been thinking about implementing for Dovecot v2.2. Probably not all of them will make it, but I'm at least interested in working on these if I have time.
Previously I've mostly been working on things that different companies were paying me to work on. This is the first time I have my own company, but the prioritization still works pretty much the same way:
- priority: If your company is highly interested in getting something implemented, we can do it as a project via my company. This guarantees that you'll get the feature implemented in a way that integrates well into your system.
- priority: Companies who have bought Dovecot support contract can let me know what they're interested in getting implemented. It's not a guarantee that it gets implemented, but it does affect my priorities. :)
- priority: Things other people want to get implemented.
There are also a lot of other things I have to spend my time on, which are before the 2. priority above. I guess we'll see how things work out.
dsync-based replication
I'll write a separate post about this later. Besides, it's coming for Dovecot v2.1 so it's a bit off topic, but I thought I'd mention it anyway.
Shared mailbox improvements
Support for private flags for all mailbox formats:
namespace { type = public prefix = Public/ mail_location = mdbox:/var/vmail/public:PVTINDEX=~/mdbox/indexes-public }
- dsync needs to be able to replicate the private flags as well as shared flags.
- might as well add a common way for all mailbox formats to specify which flags are shared and which aren't. $controldir/dovecot-flags would say which is the default (private or shared) and what flags/keywords are
opposite. the above..)
- easy way to configure shared mailboxes to be accessed via imapc backend, which would allow easy shared mailbox accesses across servers or simply between two system users in same server. (this may be tricky to dsync.)
- global ACLs read from a single file supporting wildcards, instead of multiple different files
- default ACLs for each namespace/storage root (maybe implemented using
Metadata / annotations
Add support for server, mailbox and mail annotations. These need to be dsyncable, so their changes need to be stored in various .log files:
Per-server metadata. This is similar to subscriptions: Add changes to dovecot.mailbox.log file, with each entry name a hash of the metadata key that was changed.
Per-mailbox metadata. Changes to this belong inside mailbox_transaction_context, which write the changes to mailbox's dovecot.index.log files. Each log record contains a list of changed annotation keys. This gives each change a modseq, and also allows easily finding out what changes other clients have done, so if a client has done ENABLE METADATA Dovecot can easily push metadata changes to client by only reading the dovecot.index.log file.
Per-mail metadata. This is pretty much equivalent to per-mailbox metadata, except changes are associated to specific message UIDs.
The permanent storage is in dict. The dict keys have components:
- priv/ vs. shared/ for specifying private vs. shared metadata
- server/ vs mailbox/<mailbox guid>/ vs. mail/<mailbox guid>/<uid>
- the metadata key name
This would be a good time to improve the dict configuration to allow things like:
- mixed backends for different hierarchies (e.g. priv/mailbox/* goes to a file, while the rest goes to sql)
- allow sql dict to be used in more relational way, so that mail annotations could be stored with tables: mailbox (id, guid) and mail_annotation (mailbox_id, key, value), i.e. avoid duplicating the guid everywhere.
Things to think through: possible. For per-mail annotation lookups mail_alloc() needs to include
- How to handle quota? Probably needs to be different from regular mail quota. Probably some per-user "metadata quota bytes" counter/limit.
- Dict lookups should be done asynchronously and prefetched as much as
a list of annotations that are wanted.
Configuration
Copy all mail settings to namespaces, so it'll be possible to use per-namespace mailbox settings. Especially important for imapc_* settings, but can be useful for others as well. Those settings that aren't explicitly defined in the namespace will use the global defaults. (Should doveconf -a show all of these values, or simply the explicitly set values?)
Get rid of *.conf.ext files. Make everything part of dovecot.conf, so doveconf -n outputs ALL of the configuration. There are mainly 3 config files I'm thinking about: dict-sql, passdb/userdb sql, passdb/userdb ldap. The dict-sql is something I think needs a bigger redesign (mentioned above in "Metadata" section), but the sql/ldap auth configs could be merged. One way could be:
sql_db sqlmails { # most settings from dovecot-sql.conf.ext, except for queries driver = mysql connect = ... }
ldap_db ldapmails { # most settings from dovecot-ldap.conf.ext, except attributes/filters }
passdb { driver = sql db = sqlmails sql_query = select password from users where username = '%u' } passdb { driver = ldap db = ldapmails ldap_attributes { password = %{ldap:userPassword} } ldap_filter = ... }
The sql_db {} and ldap_db {} would be generic enough to be used everywhere (e.g. dict-sql), not just for passdb/userdb.
Some problems: passdbs/userdbs. Perhaps a similar solution?
- Similar to the per-namespace mail settings, doveconf -a would output all sql_query, ldap_attributes, ldap_filter, etc. settings for all
- The database configs contain passwords, so they should be readable only by root. This makes running dovecot-lda and maybe doveadm difficult, since they fail at "permission denied" when trying to open the config. There are probably only two solutions: a) The db configs need to be !include_try'd or b) the configs can be world-readable, but only passwords are placed to only-root-readable files by using "password =
IMAP state saving/restoring
IMAP connections are often long running. Problems with this:
- Currently each connection requires a separate process (at least to work reliably), which means each connection also uses quite a lot of memory even when they aren't doing anything for a long time.
- Some clients don't handle lost connections very nicely. So Dovecot can't be upgraded without causing some user annoyance. Also in a cluster if you want to bring down one server, the connections have to be disconnected before they can be moved to another server.
If IMAP session state could be reliably saved and later restored to another process, both of the above problems could be avoided entirely. Typically when a connection is IDLEing there are really just 4 things
need to be remembered: username, selected mailbox name, its UIDVALIDITY and HIGHESTMODSEQ. With this information the IMAP session can be fully restored in another process without losing any state. So, what we could do is:
- When an IMAP connection has bee IDLEing for a while (configurable initial time, could be dynamically adjusted):
- move the IMAP state and the connection fd to imap-idle process
- the old imap process is destroyed
- imap-idle process can handle lots of IMAP connections
- imap-idle process also uses inotify/etc. to watch for changes in the specified mailbox
- if any mailbox changes happen or IMAP client sends a command, start up a new imap process, restore the state and continue from where we left off
- This could save quite a lot of memory at the expense of some CPU usage
Dovecot proxy <-> backend protocol could be improved to support moving connection to another backend. Possibly using a separate control connection to avoid making the proxying less efficient in normal operation.
When restarting Dovecot, move all the connections to a process that keeps the connections open for a while. When Dovecot starts up, create imap processes back to the connections. This allows changing configuration for existing client connections (which sometimes may be bad! need to add checks against client-visible config conflicts), upgrading Dovecot, etc. without being visible to clients. The only problem is SSL connections: OpenSSL doens't provide a way to save/restore state, so either you need to set shutdown_clients=no (and possibly keep some imap-login processes doing SSL proxying for a long time), or SSL connections need to be killed. Of course the SSL handling could be outsourced to some other software/hardware outside Dovecot.
The IMAP state saving isn't always easy. Initially it could be implemented only for the simple cases (which are a majority) and later extended to cover more.
IMAP extensions
- CATENATE is already implemented by Stephan
- URLAUTH is also planned to be implemented, somewhat differently than in Apple's patch. The idea is to create a separate imap-urlauth service
provides extra security.
- NOTIFY extension could be implemented efficiently using mailbox list indexes, which already exists in v2.1.
- FILTERS extension can be easily implemented once METADATA is implemented
- There are also other missing extensions, but they're probably less important: BINARY & URLAUTH=BINARY, CONVERT, CONTEXT=SORT, CREATE-SPECIAL-USE, MULTISEARCH, UTF8=* and some i18n stuff.
Backups
Filesystem based backups have worked well enough with Dovecot in the
But with new features like single instance storage it's becoming more difficult. There's no 100% consistent way to even get filesystem level backups with SIS enabled, because deleting both the message file and its attachment files can't be done atomically (although usually this isn't a real problem). Restoring SIS mails is more difficult though, first you need to restore the dbox mail files and then you need to figure out what attachment files from SIS need to be restored, and finally you'll need to do doveadm import to put them into their final destination.
I don't have much experience with backup software, but other people in my company do. The initial idea is to implement a Dovecot backup agent to one (commercial) backup software, which allows doing online backups and restoring mails one user/mailbox/mail at a time. I don't know the
On Mon, 13 Feb 2012 13:47:06 +0200, Timo Sirainen tss@iki.fi wrote: the that that past. details
yet how exactly this is going to be implemented, but the basic plan is probably to implement a "backup" mail storage backend, which is a PostgreSQL pg_dump-like flat file containing mails from all mailboxes. doveadm backup/import can then export/import this format via stdout/stdin. Incremental backups could possibly be done by giving a timestamp of previous backup run (I'm not sure about this yet).
Once I've managed to implement the first fully functional backup agent, it should become clearer how to implement it to other backup solutions.
Random things
- dovecot.index.cache file writing is too complex, should be simplified
- Enable auth_debug[_passwords]=yes on-the-fly for some specific users/IPs via doveadm
- Optimize virtual mailboxes using mailbox list indexes. It wouldn't anymore need to keep all the backend mailboxes' index files open.
- Would be nice to go forward with supporting key-value databases as mail storage backends.
Timo,
I know you mentioned you would cover this in a coming post, but we were curious what the new dsync replication will be capable of. Would it monitor changes to mailboxes and push automatic replication to the remote mail store, and if this is the case could it be an N-way replication setup in which any host in a cluster can participate in the replication? Do you consider this to be a high availability solution?
Thanks,
Michael
On 15.2.2012, at 5.08, list@airstreamcomm.net list@airstreamcomm.net wrote:
I know you mentioned you would cover this in a coming post, but we were curious what the new dsync replication will be capable of. Would it monitor changes to mailboxes and push automatic replication to the remote mail store,
Yes.
and if this is the case could it be an N-way replication setup in which any host in a cluster can participate in the replication?
Initially 2-way, but I don't think anything prevents it being N-way.
Do you consider this to be a high availability solution?
The initial version is really about doing all of this with NFS. In NFS setup if two replaced storages are both mounted and the primary storage dies, Dovecot will start using the replica. So that's HA.
The other possibility is to run Dovecot in two completely separate data centers and replicate through ssh. Here are more possibilities for how to do HA, but some of them also have downsides.. dovecot.fi mails are actually done this way, and can be accessed from either server at any time. I've been thinking about soon making half of my clients use one server and half the other one to see if I can find any dsync bugs (I've always 3-4 IMAP clients connected).
On Wed, 15 Feb 2012 20:51:59 +0200, Timo Sirainen tss@iki.fi wrote:
On 15.2.2012, at 5.08, list@airstreamcomm.net list@airstreamcomm.net wrote:
I know you mentioned you would cover this in a coming post, but we were curious what the new dsync replication will be capable of. Would it monitor changes to mailboxes and push automatic replication to the remote mail store,
Yes.
and if this is the case could it be an N-way replication setup in which any host in a cluster can participate in the replication?
Initially 2-way, but I don't think anything prevents it being N-way.
Do you consider this to be a high availability solution?
The initial version is really about doing all of this with NFS. In NFS setup if two replaced storages are both mounted and the primary storage dies, Dovecot will start using the replica. So that's HA.
The other possibility is to run Dovecot in two completely separate data centers and replicate through ssh. Here are more possibilities for how to do HA, but some of them also have downsides.. dovecot.fi mails are actually done this way, and can be accessed from either server at any time. I've been thinking about soon making half of my clients use one server and half the other one to see if I can find any dsync bugs (I've always 3-4 IMAP clients connected).
Just to throw our thoughts into the mix, finding an open source multi-site active/active mail solution that does not require building super expensive multi-site storage systems would be a really refreshing way to purse this level of availability. Maybe the only way to accurately get this level of availability is to cluster the storage between sites?
On Mon, Feb 13, 2012 at 3:47 AM, Timo Sirainen tss@iki.fi wrote:
Here's a list of things I've been thinking about implementing for Dovecot v2.2. Probably not all of them will make it, but I'm at least interested in working on these if I have time.
Previously I've mostly been working on things that different companies were paying me to work on. This is the first time I have my own company, but the prioritization still works pretty much the same way:
- 1. priority: If your company is highly interested in getting something implemented, we can do it as a project via my company. This guarantees that you'll get the feature implemented in a way that integrates well into your system. - 2. priority: Companies who have bought Dovecot support contract can let me know what they're interested in getting implemented. It's not a guarantee that it gets implemented, but it does affect my priorities. :) - 3. priority: Things other people want to get implemented.
There are also a lot of other things I have to spend my time on, which are before the 2. priority above. I guess we'll see how things work out.
Not to beat a dead horse, but the ability to use remote directors might be interesting. It'd make moving into a director setup probably a bit more easy. Then any server could proxy to the backend servers, but without losing the advantage of director-based locality. If a box sees one of its own IPs in the director_servers list, then it knows it's part of the ring. If it doesn't, then it could contact a randomly selected director IP.
On 15.2.2012, at 21.02, Mark Moseley wrote:
Not to beat a dead horse, but the ability to use remote directors might be interesting. It'd make moving into a director setup probably a bit more easy. Then any server could proxy to the backend servers, but without losing the advantage of director-based locality. If a box sees one of its own IPs in the director_servers list, then it knows it's part of the ring. If it doesn't, then it could contact a randomly selected director IP.
It should already be possible to do that, although not automatically based on looking at your own IP.. Anyway, non-director servers could simply have the passdb return proxy=y host=director-servers, where director-servers expands to a round-robin list of director IPs (Dovecot uses the first one).
I guess it would be possible to do this automatically if passdb lookup returns proxy=y but no host (means director isn't enabled), but if director_servers is non-empty one of the IPs would be randomly chosen. A little kludgy though..
participants (5)
-
forumer@smartmobili.com
-
list@airstreamcomm.net
-
Mark Moseley
-
Robert Schetterer
-
Timo Sirainen