[Dovecot] dovecot 2.1.5 performance
Hello,
I'm migrating from 1.1.16 running in 4 debian lenny servers virtualized
with xenserver and 1 core and 5GB of RAM to 2.1.5 running in 4 ubuntu 12.04 servers with 6 cpu cores and 16GB of RAM virtualized with VMWare, but I'm having lots a performance problems. I don't think that virtualization platform could be the problem, because the new servers running in xenserver has the same problems than running in vmware.
I have about 70000 user accounts, most of them without real activity
(they are students who doesn't read his email or have its account redirected to other provider). I have about 700-1000 concurrent imap connections.
I have storage in nfs (nfsv3, the nfs server is a celerra), but indexes
are in local filesystems (each server has its own index fs). Mailboxes are in maildir format.
Old servers and actual director servers are load balanced with an
radware appdirector load balancer (the new backend servers don't need to be balanced because I'm using a director farm)
In the old platform I have scenario number 2 described at
http://wiki2.dovecot.org/NFS, but in the new ones I have a director proxy directing all connections from each user to the same server (I don't specify any server for the user, director selects it according to the hash algorithm it has).
Some doubts I have for the recommended in that url:
mmap_disable: both single and multi server configurations have mmap_disable=yes but in index file section says that you need it if you have your index files stored in nfs. I have it stored locally. Do I need mmap_disable=yes? What it's the best?
dotlock_use_excl: it is set to no in both configurations, but the comment says that it is needed only in nfsv2. Since I have nfs3, I have it set it to yes.
mail_nfs_storage: In single server is set to no, but in multi server it set to yes. Since I have a director in front of my backend server, what is the recommended?
With this configuration, when I have a few connections (about 300-400 imap connections) everything is working fine, but when I disconnect the old servers and direct all my users' connections to the new servers I have lot of errors. server loads increments to over 300 points, with a very high io wait. With atop, I could see that of my 6 cores, I have one with almost 100% waiting for i/o and the other with almost 100% idle, but load of the server is very, very high.
With the old servers, I have performance problems, access to mail is slow, but it works. But with the new ones it doesn't work at all.
Any idea?
-- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información _o) y las Comunicaciones Aplicadas (ATICA) / \\ http://www.um.es/atica _(___V Tfo: 868887590 Fax: 868888337
On 20/06/12 11:40, Angel L. Mateo wrote:
Hello,
I'm migrating from 1.1.16 running in 4 debian lenny servers
virtualized with xenserver and 1 core and 5GB of RAM to 2.1.5 running in 4 ubuntu 12.04 servers with 6 cpu cores and 16GB of RAM virtualized with VMWare, but I'm having lots a performance problems. I don't think that virtualization platform could be the problem, because the new servers running in xenserver has the same problems than running in vmware.
I have about 70000 user accounts, most of them without real
activity (they are students who doesn't read his email or have its account redirected to other provider). I have about 700-1000 concurrent imap connections.
I have storage in nfs (nfsv3, the nfs server is a celerra), but
indexes are in local filesystems (each server has its own index fs). Mailboxes are in maildir format.
Old servers and actual director servers are load balanced with an
radware appdirector load balancer (the new backend servers don't need to be balanced because I'm using a director farm)
In the old platform I have scenario number 2 described at
http://wiki2.dovecot.org/NFS, but in the new ones I have a director proxy directing all connections from each user to the same server (I don't specify any server for the user, director selects it according to the hash algorithm it has).
Some doubts I have for the recommended in that url:
mmap_disable: both single and multi server configurations have mmap_disable=yes but in index file section says that you need it if you have your index files stored in nfs. I have it stored locally. Do I need mmap_disable=yes? What it's the best?
dotlock_use_excl: it is set to no in both configurations, but the comment says that it is needed only in nfsv2. Since I have nfs3, I have it set it to yes.
mail_nfs_storage: In single server is set to no, but in multi server it set to yes. Since I have a director in front of my backend server, what is the recommended?
With this configuration, when I have a few connections (about 300-400 imap connections) everything is working fine, but when I disconnect the old servers and direct all my users' connections to the new servers I have lot of errors. server loads increments to over 300 points, with a very high io wait. With atop, I could see that of my 6 cores, I have one with almost 100% waiting for i/o and the other with almost 100% idle, but load of the server is very, very high.
With the old servers, I have performance problems, access to mail is slow, but it works. But with the new ones it doesn't work at all.
Any idea?
I forgot attaching my doveconf.
-- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información _o) y las Comunicaciones Aplicadas (ATICA) / \\ http://www.um.es/atica _(___V Tfo: 868887590 Fax: 868888337
El 20/06/12 11:46, Angel L. Mateo escribió:
On 20/06/12 11:40, Angel L. Mateo wrote:
Hello,
I'm migrating from 1.1.16 running in 4 debian lenny servers virtualized with xenserver and 1 core and 5GB of RAM to 2.1.5 running in 4 ubuntu 12.04 servers with 6 cpu cores and 16GB of RAM virtualized with VMWare, but I'm having lots a performance problems. I don't think that virtualization platform could be the problem, because the new servers running in xenserver has the same problems than running in vmware.
I have about 70000 user accounts, most of them without real activity (they are students who doesn't read his email or have its account redirected to other provider). I have about 700-1000 concurrent imap connections.
I have storage in nfs (nfsv3, the nfs server is a celerra), but indexes are in local filesystems (each server has its own index fs). Mailboxes are in maildir format.
Old servers and actual director servers are load balanced with an radware appdirector load balancer (the new backend servers don't need to be balanced because I'm using a director farm)
In the old platform I have scenario number 2 described at http://wiki2.dovecot.org/NFS, but in the new ones I have a director proxy directing all connections from each user to the same server (I don't specify any server for the user, director selects it according to the hash algorithm it has).
Some doubts I have for the recommended in that url:
- mmap_disable: both single and multi server configurations have mmap_disable=yes but in index file section says that you need it if you have your index files stored in nfs. I have it stored locally. Do I need mmap_disable=yes? What it's the best?
- dotlock_use_excl: it is set to no in both configurations, but the comment says that it is needed only in nfsv2. Since I have nfs3, I have it set it to yes.
- mail_nfs_storage: In single server is set to no, but in multi server it set to yes. Since I have a director in front of my backend server, what is the recommended?
As I see it, director ensures that only 1 server is accesing any given file, so you don't need any special conf (so mmap_disable=no & mail_nfs_storage=no)
On Wed, 2012-06-20 at 11:40 +0200, Angel L. Mateo wrote:
- mmap_disable: both single and multi server configurations have mmap_disable=yes but in index file section says that you need it if you have your index files stored in nfs. I have it stored locally. Do I need mmap_disable=yes? What it's the best?
mmap_disable is used only for index files, so with local indexes use "no". (If indexes were on NFS, "no" would probably still work but I'm not sure if the performance would be better or worse. Errors would also trigger SIGBUS crashes.)
- dotlock_use_excl: it is set to no in both configurations, but the comment says that it is needed only in nfsv2. Since I have nfs3, I have it set it to yes.
"yes" is ok.
- mail_nfs_storage: In single server is set to no, but in multi server it set to yes. Since I have a director in front of my backend server, what is the recommended?
With director you can set this to "no".
With this configuration, when I have a few connections (about 300-400 imap connections) everything is working fine, but when I disconnect the old servers and direct all my users' connections to the new servers I have lot of errors.
Real errors that show up in Dovecot logs? What kind of errors?
server loads increments to over 300 points, with a very high io wait. With atop, I could see that of my 6 cores, I have one with almost 100% waiting for i/o and the other with almost 100% idle, but load of the server is very, very high.
Does the server's disk IO usage actually go a lot higher, or is it simply waiting without doing much of anything? I wonder if this is related to the inotify problems: http://dovecot.org/list/dovecot/2012-June/066474.html
Another thought: Since indexes are stored locally, is it possible that the extra load comes simply from building the indexes on the new servers, while they already exist on the old ones?
mail_fsync = always
v1.1 did the equivalent of mail_fsync=optimized. You could see if that makes a difference.
maildir_stat_dirs = yes
Do you actually need this? It causes unnecessary disk IO and probably not needed in your case.
default_process_limit = 1000
Since you haven't enabled high-performance mode for imap-login processes and haven't otherwise changed the service imap-login settings, this means that you can have max. 1000 simultaneous IMAP SSL/TLS connections.
On 20/06/12 12:05, Timo Sirainen wrote:
On Wed, 2012-06-20 at 11:40 +0200, Angel L. Mateo wrote:
- mmap_disable: both single and multi server configurations have mmap_disable=yes but in index file section says that you need it if you have your index files stored in nfs. I have it stored locally. Do I need mmap_disable=yes? What it's the best?
mmap_disable is used only for index files, so with local indexes use "no". (If indexes were on NFS, "no" would probably still work but I'm not sure if the performance would be better or worse. Errors would also trigger SIGBUS crashes.)
- dotlock_use_excl: it is set to no in both configurations, but the comment says that it is needed only in nfsv2. Since I have nfs3, I have it set it to yes.
"yes" is ok.
- mail_nfs_storage: In single server is set to no, but in multi server it set to yes. Since I have a director in front of my backend server, what is the recommended?
With director you can set this to "no".
Ok, I'm going to change it.
With this configuration, when I have a few connections (about 300-400 imap connections) everything is working fine, but when I disconnect the old servers and direct all my users' connections to the new servers I have lot of errors.
Real errors that show up in Dovecot logs? What kind of errors?
Lot of errors like:
Jun 20 12:42:37 myotis31 dovecot: imap(vlo): Warning: Maildir /home/otros/44/016744/Maildir/.INBOX.PRUEBAS: Synchronization took 278 seconds (0 new msgs, 0 flag change attempts, 0 expunge attempts) Jun 20 12:42:38 myotis31 dovecot: imap(vlo): Warning: Transaction log file /var/indexes/vlo/.INBOX.PRUEBAS/dovecot.index.log was locked for 279 seconds
and in the relay server, lots of timeout errors delivering to lmtp:
un 20 12:38:29 xenon14 postfix/lmtp[12004]: D48D55D4F7: to=<dmv@um.es>, relay=pop.um.es[155.54.212.106]:24, delay=150, delays=0.09/0/0/150, dsn=4.4.0, status=deferred (host pop.um.es[155.54.212.106] said: 451 4.4.0 Remote server not answering (timeout while waiting for reply to DATA reply) (in reply to end of DATA command))
server loads increments to over 300 points, with a very high io wait. With atop, I could see that of my 6 cores, I have one with almost 100% waiting for i/o and the other with almost 100% idle, but load of the server is very, very high.
Does the server's disk IO usage actually go a lot higher, or is it simply waiting without doing much of anything? I wonder if this is related to the inotify problems: http://dovecot.org/list/dovecot/2012-June/066474.html
Now we have rollbacked to the old servers, so I don't know. Next time
we try, I'll check this.
Another thought: Since indexes are stored locally, is it possible that the extra load comes simply from building the indexes on the new servers, while they already exist on the old ones?
I don't think so, because:
- In the old servers, we have no "director like" mechanism. One IP is always directed to the same server (during a session timeout, today could be one server and tomorrow another different), but mail is delivered randomly through one of the server.
- Since last week (when we started migration) all mail is delivered into the mailboxes by the new servers, passing through director. So new server's indexes should be updated.
mail_fsync = always
v1.1 did the equivalent of mail_fsync=optimized. You could see if that makes a difference.
I'll try this.
maildir_stat_dirs = yes
Do you actually need this? It causes unnecessary disk IO and probably not needed in your case.
My fault. I understood the explanation completely wrong. I thought that
yes should do what actually does no. I have fixed it.
default_process_limit = 1000
Since you haven't enabled high-performance mode for imap-login processes and haven't otherwise changed the service imap-login settings, this means that you can have max. 1000 simultaneous IMAP SSL/TLS connections.
I know it. I have to tune it.
-- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información _o) y las Comunicaciones Aplicadas (ATICA) / \\ http://www.um.es/atica _(___V Tfo: 868887590 Fax: 868888337
I know it. I have to tune it.
-- he did not only changed Dovecot but OS. I would bet it is his OS problem - as he stated 100% of single core is used while 6 are available. something definitely not dovecot dependent.
i would recommend installing exactly the same version of old dovecot on new OS and test it.
El 20/06/12 12:05, Timo Sirainen escribió:
default_process_limit = 1000
Since you haven't enabled high-performance mode for imap-login processes and haven't otherwise changed the service imap-login settings, this means that you can have max. 1000 simultaneous IMAP SSL/TLS connections.
According to http://wiki2.dovecot.org/LoginProcess
Since one login process can handle only one connection, the service's process_limit setting limits the number of users that can be logging in at the same time (defaults to default_process_limit=100).
I understood this as there can only be up to 100 (or 1000 in my case)
concurrently trying to log in, but once the user logs, the imap-login process ends (starting corresponding imap processes) and another users could log in. So there could be more than 100 users connected, but up to 100 trying to connect. Am I wrong?
If I am wrong, why in my system there is no imap-login processes (or
just a few) but a lot of imap?
On 21.6.2012, at 11.44, Angel L. Mateo wrote:
El 20/06/12 12:05, Timo Sirainen escribió:
default_process_limit = 1000
Since you haven't enabled high-performance mode for imap-login processes and haven't otherwise changed the service imap-login settings, this means that you can have max. 1000 simultaneous IMAP SSL/TLS connections.
According to http://wiki2.dovecot.org/LoginProcess
Since one login process can handle only one connection, the service's process_limit setting limits the number of users that can be logging in at the same time (defaults to default_process_limit=100).
I understood this as there can only be up to 100 (or 1000 in my case) concurrently trying to log in, but once the user logs, the imap-login process ends (starting corresponding imap processes) and another users could log in. So there could be more than 100 users connected, but up to 100 trying to connect. Am I wrong?
If I am wrong, why in my system there is no imap-login processes (or just a few) but a lot of imap?
Look at the next sentence also: SSL/TLS proxying processes are also counted here, so if you're using SSL/TLS you'll need to make sure this count is higher than the maximum number of users that can be logged in simultaneously.
I guess you don't have many SSL/TLS connections.
El 21/06/12 11:53, Timo Sirainen escribió:
Look at the next sentence also: SSL/TLS proxying processes are also counted here, so if you're using SSL/TLS you'll need to make sure this count is higher than the maximum number of users that can be logged in simultaneously.
I guess you don't have many SSL/TLS connections.
I'm not using SSL/TLS (it is done by a ssl accelerator, so connections
to backend is plain)
participants (4)
-
Angel L. Mateo
-
Joseba Torre
-
Timo Sirainen
-
Wojciech Puchar