[Dovecot] imaptest10 and stalled messages
Greetings -
I've now got as far as playing with the imaptest10 test utility to
see if I can stress-test our development server. imaptest10 is built
against dovecot-1.0rc28
It may just be that I'm excessively heavy handed, but when I let
imaptest10 rip with the command...
./imaptest10 user=test%03d host=testserver.imap.york.ac.uk clients=50
mbox=./dovecot.mbox msgs=1000 secs=30 logout=10
...after 15 seconds or so I started getting output along the lines of:
Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo 100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 10% 30% 5% 2 0 1 1 1 0 0 0 0 0 0 50/ 50 2 0 0 1 1 0 0 0 0 0 0 50/ 50 0 0 2 2 0 1 1 1 1 1 0 50/ 50 2 0 2 2 0 1 1 1 1 1 2 50/ 50 0 0 1 1 0 0 0 0 0 0 0 50/ 50
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN 2 1 1 2 1 0 0 0 0 0 0 50/ 50
- stalled for 17 secs in LOGIN
- stalled for 17 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN 1 0 1 1 0 0 0 0 0 0 0 50/ 50
- stalled for 18 secs in LOGIN
- stalled for 18 secs in LOGIN
- stalled for 17 secs in LOGIN
- stalled for 17 secs in LOGIN
- stalled for 17 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN
The "stalled" log lines continue until the test completes. I'm
assuming it's just that I'm pushing our hardware beyond its
reasonable limits: it is only an oldish development box...
Sun v120 with 1 650MHz SPARC processor and 1Gbyte RAM
Dovecot is using shadow password file authentication. The mail
folders are in Maildirs, NFS-mounted from a NetApp filer, and each
account's INBOX has around 1000 messages in it. Control and Index
files are on local disk.
By the end of the 30 second test run load average has climbed to
around 11
I don't know how "active" the imaptest10 utility is when it's
running, but do you think I should be concerned by the "stalled"
messages? That is, do you think I have something configured amiss?
Or is it likely just to be I'm expecting too much of our hardware?
(The real system will have beefier Suns with Opteron-based CPUs, more
memory, etc)
With thanks, Mike B-)
PS. I forgot to say: despite all this activity there's not a single
error logged by Dovecot in the (admittedly short at this stage) test
runs.
-- The Computing Service, University of York, Heslington, York Yo10 5DD, UK Tel:+44-1904-433811 FAX:+44-1904-433740
- Unsolicited commercial e-mail is NOT welcome at this e-mail address. *
On Wed, 2007-03-28 at 18:07 +0100, Mike Brudenell wrote:
...after 15 seconds or so I started getting output along the lines of: .. Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo 100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 10% 30% 5% 2 0 1 1 1 0 0 0 0 0 0 50/ 50 2 0 0 1 1 0 0 0 0 0 0 50/ 50
It's logging in pretty slowly..
- stalled for 16 secs in LOGIN
Try changing http://wiki.dovecot.org/LoginProcess settings, especially login_process_per_connection=no and see how much it helps.
Also you could use logout=0 parameter to imaptest to avoid the login/logout overhead.
On 28 Mar 2007, at 18:15, Timo Sirainen wrote:
- stalled for 16 secs in LOGIN
Try changing http://wiki.dovecot.org/LoginProcess settings, especially login_process_per_connection=no and see how much it helps.
Sorry I forgot to give my current login and auth settings (which, I
have to admit, I thought I'd tuned to be reasonable/generous for the
eventual environment)...
login_process_size = 64 login_process_per_connection = no login_processes_count = 10 #login_max_processes_count = 128 #login_max_connections = 256
#auth_process_size = 256 auth_cache_size = 1024 auth_cache_ttl = 600 #auth_worker_max_count = 30
auth default { passdb shadow { # [blocking=yes] - See userdb passwd for explanation #args = }
userdb passwd {
# [blocking=yes] - By default the lookups are done in the main
dovecot-auth
# process. This setting causes the lookups to be done in auth
worker
# proceses. Useful with remote NSS lookups that may block.
# NOTE: Be sure to use this setting with nss_ldap or users might
get
# logged in as each others!
#args =
}
}
Also you could use logout=0 parameter to imaptest to avoid the login/logout overhead.
Re-running the test with logout=0 didn't help :-( ...
% ./imaptest10 user=test%03d host=testserver.imap.york.ac.uk
clients=50 mbox=./dovecot.mbox msgs=1000 secs=30 logout=0
Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe
100% 50% 50% 100% 100% 100% 50% 100% 100% 100%
30% 5%
5 1 1 5 1 0 0 0 0 0 25/ 25 [50%]
3 2 0 2 0 0 0 0 0 0 34/ 34 [68%]
3 1 2 3 0 0 0 0 0 0 43/ 43 [86%]
1 1 1 2 0 0 0 0 0 0 46/ 46 [92%]
2 1 1 3 0 1 1 1 1 1 50/ 50
1 0 0 0 0 0 0 0 0 0 50/ 50
2 2 1 3 0 0 0 0 0 0 50/ 50
2 1 1 1 1 0 0 0 0 0 50/ 50
0 1 0 1 0 0 0 0 0 0 50/ 50
1 1 1 1 0 1 0 1 1 1 50/ 50
Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe
100% 50% 50% 100% 100% 100% 50% 100% 100% 100%
30% 5%
2 0 1 3 1 0 0 0 0 0 50/ 50
1 0 0 1 0 0 0 0 0 0 50/ 50
1 1 1 1 0 0 0 0 0 0 50/ 50
1 0 0 0 0 0 0 0 0 0 50/ 50
1 1 2 2 0 1 0 1 1 1 50/ 50
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN 1 0 1 2 0 0 0 0 0 0 50/ 50
- stalled for 17 secs in LOGIN
- stalled for 17 secs in LOGIN
- stalled for 17 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN
- stalled for 16 secs in LOGIN ...
-- The Computing Service, University of York, Heslington, York Yo10 5DD, UK Tel:+44-1904-433811 FAX:+44-1904-433740
- Unsolicited commercial e-mail is NOT welcome at this e-mail address. *
On 28.3.2007, at 20.26, Mike Brudenell wrote:
On 28 Mar 2007, at 18:15, Timo Sirainen wrote:
- stalled for 16 secs in LOGIN
Try changing http://wiki.dovecot.org/LoginProcess settings,
especially login_process_per_connection=no and see how much it helps.Sorry I forgot to give my current login and auth settings (which, I
have to admit, I thought I'd tuned to be reasonable/generous for
the eventual environment)...login_process_size = 64 login_process_per_connection = no login_processes_count = 10
OK, so it's not the login process then that's the problem.
Have you tried using passwd-file instead of passwd as passdb and userdb?
- stalled for 17 secs in LOGIN
- stalled for 17 secs in LOGIN ..
When this happens, can you manually log in at the same time?
What about running: imaptest - select=0
ie. doing nothing else than login/logout. Does it still hang?
Greetings -
On 28 Mar 2007, at 20:45, Timo Sirainen wrote:
- stalled for 17 secs in LOGIN
- stalled for 17 secs in LOGIN ..
When this happens, can you manually log in at the same time?
Yes, but SLOWLY. It's making me think we're pushing the machine and/ or its 100Mbit network card too far. See below for the latest...
What about running: imaptest - select=0
ie. doing nothing else than login/logout. Does it still hang?
Nope: that works fine (as does imaptest -)
Latest...
I've switched to using a beefier box now: one more like the servers
we would be buying. This is a Sun system running Solaris 10 with 4 *
2.2GHz Opteron processors, 4Gb RAM and a Gigabit Ethernet card.
Starting off with empty Maildirs (NFS-mounted), and non-existent
index and control files (on local disk), I re-ran my test:
./imaptest10 user=mtest%03d clients=50 host=mail-gw5.york.ac.uk
mbox=../imaptest/dovecot.mbox msgs=1000 secs=300
At first this looked great, logging all 50 processes in within the
first second (assuming I'm right in thinking the N/N at the end of
each line of imaptest output is something to do with this count?)
Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo
100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 100%
30% 5%
54 7 3 16 0 0 0 0 0 4 8 50/ 50
23 21 22 41 4 4 0 0 2 23 46 50/ 50
35 17 16 35 22 23 0 0 12 35 70 50/ 50
40 17 20 37 26 31 7 1 24 40 80 50/ 50
28 17 15 35 28 40 2 0 25 28 56 50/ 50
27 10 8 24 21 24 3 0 22 27 54 50/ 50
43 17 21 42 34 44 8 0 32 43 86 50/ 50
51 23 22 46 37 62 8 1 41 51 102 50/ 50
39 24 25 44 39 53 4 1 35 39 78 50/ 50
38 19 20 34 38 58 10 2 39 38 76 50/ 50
Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo
100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 100%
30% 5%
48 27 26 46 42 58 11 0 44 48 96 50/ 50
52 23 25 48 44 63 14 2 52 52 104 50/ 50
43 24 26 43 47 65 9 1 42 44 88 50/ 50
39 16 23 39 42 48 9 2 37 38 76 50/ 50
51 24 24 52 49 71 24 2 51 51 102 50/ 50
Error: BAD reply 'Error in IMAP command APPEND: Invalid arguments.'
for command: APPEND "INBOX" () {134933+}
Error: APPEND replied BAD: Error in IMAP command APPEND: Invalid
arguments.
44 24 21 42 46 64 13 4 47 43 86 50/ 50
53 23 20 48 48 65 21 5 51 54 106 50/ 50
40 22 26 46 42 58 18 2 41 39 80 50/ 50
43 19 20 39 41 45 19 4 41 43 86 50/ 50
35 22 17 37 37 58 22 6 34 35 70 50/ 50
[?? I'm assuming the BAD reply was some General Nastiness in the mbox
I'm feeding into imaptest10? It's the dovecot.mbox file from
dovecot.org that was suggested in a message to the list a while
back. ??]
But then the ODDNESS starts. I'm still a little hazy how to
interpret the output of imaptest, but every now and then one or two
processes stall for several seconds. When this happens activity
seems to grow quieter in the imaptest output: number of operations
per second decreases and the N/N count drops. Eventually it clears
somehow and things spring back into life...
Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo 100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 100% 30% 5% 44 24 22 47 45 61 19 5 45 44 88 50/ 50 45 19 22 39 43 63 17 2 45 45 90 50/ 50 42 20 20 44 44 65 22 2 45 42 84 50/ 50
- stalled for 16 secs in SELECT 0 4 4 9 9 21 9 2 21 26 52 24/ 24 51 16 13 38 36 54 8 1 27 25 50 50/ 50
- stalled for 16 secs in SELECT
- stalled for 16 secs in SELECT 0 14 9 19 21 31 9 6 30 32 64 18/ 18 0 0 0 0 0 0 0 0 0 0 0 18/ 18 43 6 7 14 13 18 6 0 11 11 22 50/ 50 31 29 28 47 45 48 14 3 34 31 62 50/ 50 43 16 23 43 43 61 22 5 44 43 86 50/ 50
- stalled for 16 secs in SELECT Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo 100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 100% 30% 5% 0 4 6 11 14 29 11 3 26 30 60 20/ 20
- stalled for 17 secs in SELECT 0 0 2 2 2 5 1 0 2 2 4 18/ 18
- stalled for 18 secs in SELECT 0 2 1 3 3 3 1 0 3 3 6 15/ 15
- stalled for 19 secs in SELECT 0 1 1 3 3 6 1 0 3 3 6 12/ 12
- stalled for 20 secs in SELECT 0 0 1 1 1 5 0 0 1 1 2 11/ 11
- stalled for 21 secs in SELECT 0 0 1 1 1 5 0 0 1 1 2 10/ 10
- stalled for 22 secs in SELECT 0 0 1 1 1 1 0 0 1 1 2 9/ 9
- stalled for 23 secs in SELECT 0 0 0 1 1 1 0 0 0 0 0 9/ 9
- stalled for 24 secs in SELECT 0 0 0 0 0 0 0 0 1 1 2 8/ 8
- stalled for 25 secs in SELECT 0 0 0 1 1 1 1 0 1 1 2 7/ 7
- stalled for 26 secs in SELECT Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo 100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 100% 30% 5% 0 0 0 0 0 0 0 0 0 0 0 7/ 7
- stalled for 27 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 7/ 7
- stalled for 28 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 7/ 7
- stalled for 29 secs in SELECT 0 2 1 4 4 5 0 1 4 4 8 3/ 3
- stalled for 30 secs in SELECT 0 1 2 2 2 3 0 0 1 1 2 2/ 2
- stalled for 31 secs in SELECT 0 0 0 0 0 0 1 1 1 1 2 1/ 1
- stalled for 32 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 33 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 34 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 35 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 36 secs in SELECT Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo 100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 100% 30% 5% 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 37 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 38 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 39 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 40 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 41 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 42 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 43 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 44 secs in SELECT 0 0 0 0 0 0 0 0 0 0 0 1/ 1
- stalled for 45 secs in SELECT 67 15 14 28 24 31 6 0 20 18 36 50/ 50 Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo 100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 100% 30% 5% 32 28 24 55 54 62 15 3 35 32 64 50/ 50 53 25 21 43 45 63 27 6 51 53 106 50/ 50 40 19 22 43 41 57 20 3 42 41 80 50/ 50 50 22 27 53 53 67 20 4 51 49 100 50/ 50 49 15 18 44 40 60 21 6 45 49 98 50/ 50 44 24 19 44 45 61 22 4 43 44 88 50/ 50 42 24 20 42 47 64 32 7 46 42 84 50/ 50 55 24 21 45 45 71 26 4 51 55 110 50/ 50 33 22 22 42 44 52 19 2 35 33 66 50/ 50 43 18 26 40 38 63 21 7 45 44 86 50/ 50
I'm hoping it's just a fault in imaptest10 rather than something
nasty in Dovecot or itself. Any thoughts? (Dovecot doesn't log any
errors during any of this activity.)
The mailstore is mounted over NFS from a NetApp filer with these
options:
remote/read/write/nosetuid/nodevices/bg/intr/proto=tcp/noac/actimeo=0/ xattr/dev=4702692
The filestore containing the index and control files is local UFS- mounted disk with these options:
read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/ dev=154001e
dovecot -n outout is:
# /usr/local/dovecot-1.0.rc28/etc/dovecot.conf
log_path: /logfiles/mail/live/dovecot
info_log_path: /logfiles/mail/live/dovecot-info
ssl_listen: *:993
ssl_cert_file: /usr/local/dovecot/ssl/certs/imapservice-bundle.crt
ssl_key_file: /usr/local/dovecot/ssl/private/imapservice-private.key
disable_plaintext_auth: no
login_dir: /var/run/dovecot/login
login_executable: /usr/local/dovecot-1.0.rc28/libexec/dovecot/imap-login
login_process_per_connection: no
login_greeting_capability: yes
login_process_size: 64
login_processes_count: 10
max_mail_processes: 10000
mail_location: maildir:/mailstore/%1u/%u/Maildir:INDEX=/var/dovecot/
index/%1u/%u:CONTROL=/var/dovecot/control/%1u/%u
mmap_disable: yes
maildir_copy_with_hardlinks: yes
mail_plugins: quota imap_quota
mail_log_prefix: %Us(%u)[%p]:
imap_client_workarounds: delay-newmail outlook-idle
namespace:
type: private
separator: /
inbox: yes
auth default:
mechanisms: plain login
cache_size: 1024
cache_ttl: 600
username_chars:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890
username_format: %Lu
passdb:
driver: shadow
userdb:
driver: passwd
socket:
type: listen
client:
master:
path: /var/run/dovecot/auth-master
mode: 384
plugin:
quota: fs
Any thoughts?
Cheers, Mike B-)
The Computing Service, University of York, Heslington, York Yo10 5DD, UK Tel:+44-1904-433811 FAX:+44-1904-433740
- Unsolicited commercial e-mail is NOT welcome at this e-mail address. *
Hi Timo et al,
On 29 Mar 2007, at 17:52, Mike Brudenell wrote:
But then the ODDNESS starts. I'm still a little hazy how to
interpret the output of imaptest, but every now and then one or two
processes stall for several seconds. When this happens activity
seems to grow quieter in the imaptest output: number of operations
per second decreases and the N/N count drops. Eventually it clears
somehow and things spring back into life...
Further to my message to the list yesterday I'm still baffled and
concerned as to why imaptest10 shows stalls in SELECT occasionally
and, when it does so, it looks like all other clients are blocked or
something.
When the Maildir mailstore is mounted over NFS from our NetApp filer
to the Solaris 10 box Dovecot and imaptest10 are running on the
problem shows.
Switch to using local disk for the mailstore and run imaptest10 with
the same number of clients and there are no stalls. But increase the
number of simulated clients (from 50 to 100) and they come back, but
not too badly at that setting.
So it looks like something to do with when the system gets really
loaded...
I think the things I'd like to know are:
Are other people on the List running Dovecot with Maildir mailstore NFS-mounted from NetApp filers and having it work OK? (If you are using Solaris 10, what mount options are you using?)
How much real-life user load does running imaptest10 with 50
simulated clients equate to? I assume each simulated user is hammering
away at its IMAP connection, so should equate to several (how many?) normal users in real-life operation?I'm concerned by the N/M number at the end of the imaptest10 output lines plummeting whenever one process goes into this stalled state: it almost suggests as if the only thing the other processes can
do is logout? Are other sessions really being blocked, or is it just imaptest10 behaving like this?As far as I can tell I *think* it's only imaptest10 getting
blocked: when it is happening for an extended period I can quickly manually Telnet in to port 143, login as one of the test usernames and
SELECT INBOX just fine. So it's probably NOT all of the Dovecot processes getting blocked, but imaptest10 that drives them. Does that sound plausible?
Help! (Concerned, and hoping we're not going down the wrong road...
can anyone reassure me about the Solaris 10/NFS/NetApp filer setup?)
With thanks, Mike B-)
-- The Computing Service, University of York, Heslington, York Yo10 5DD, UK Tel:+44-1904-433811 FAX:+44-1904-433740
- Unsolicited commercial e-mail is NOT welcome at this e-mail address. *
EXAMPLE OUTPUT FROM IMAPTEST10
24 13 20 28 25 36 10 14 23 24 48 50/ 50 32 15 15 28 28 46 18 16 35 32 64 50/ 50 21 13 14 27 30 32 9 8 21 22 42 50/ 50
- stalled for 16 secs in SELECT 0 5 2 6 7 27 10 18 26 28 58 21/ 21 <=== 46 22 25 40 38 38 8 12 21 17 34 50/ 50 28 11 13 24 22 32 9 6 24 28 56 50/ 50 20 10 15 24 27 38 13 10 24 20 40 50/ 50 29 9 11 25 21 33 13 14 28 29 58 50/ 50 28 17 12 32 37 43 17 15 27 28 56 50/ 50
and
34 15 13 28 27 47 16 20 34 34 68 50/ 50 18 13 16 27 25 23 9 13 17 18 36 50/ 50 21 9 8 22 25 36 17 13 23 22 42 50/ 50
- stalled for 16 secs in SELECT
- stalled for 16 secs in SELECT Logi List Stat Sele Fetc Fet2 Stor Dele Expu Appe Logo 100% 50% 50% 100% 100% 100% 50% 100% 100% 100% 100% 30% 5% 0 5 2 7 7 28 11 12 28 29 60 20/ 20 <=== 41 16 18 31 26 24 6 4 12 11 22 50/ 50 29 9 15 28 29 42 10 13 27 29 58 50/ 50
On 30.3.2007, at 21.24, Mike Brudenell wrote:
- How much real-life user load does running imaptest10 with 50
simulated clients equate to? I assume each simulated user is hammering
away at its IMAP connection, so should equate to several (how many?)
normal users in real-life operation?
I don't know, but most IMAP connections are just idling, so it could
be anything from 10-100x.
- I'm concerned by the N/M number at the end of the imaptest10
output lines plummeting whenever one process goes into this stalled
state: it almost suggests as if the only thing the other processes can
do is logout? Are other sessions really being blocked, or is it just imaptest10 behaving like this?
It's just imaptest. If it detects one connection stalling, it stops
making new connections to the server until the stall goes away.
That's however a good way to detect what it is that's stalling. Just
wait for a while until there are only stalled connections, and you
can truss the imap connections to see what they're doing. Are they
waiting for some locks?
On 29.3.2007, at 19.52, Mike Brudenell wrote:
Error: BAD reply 'Error in IMAP command APPEND: Invalid arguments.'
for command: APPEND "INBOX" () {134933+} Error: APPEND replied BAD: Error in IMAP command APPEND: Invalid
arguments.[?? I'm assuming the BAD reply was some General Nastiness in the
mbox I'm feeding into imaptest10? It's the dovecot.mbox file from
dovecot.org that was suggested in a message to the list a while
back. ??]
Can you reproduce this? It could be that imaptest sends something
wrong, but I did notice and fix several bugs in APPEND command
because of imaptest. Although they were all related to error
conditions, such as out of quota handling with MULTIAPPEND.
participants (2)
-
Mike Brudenell
-
Timo Sirainen