[Dovecot] Random timeouts on mailboxes

Bastien Semene bsemene at cyanide-studio.com
Mon Mar 9 12:37:00 EET 2009


Hello,

My dovecot server is currently having random issues when users want to 
acces their mailbox.
When the issue is occuring there's a timeout. It's random because it 
happens more or less often and is not dependent on a user or the way 
their check the mailbox.
The time during the problem occurs is random, sometimes it's less than a 
minute, sometimes it's some hours.
When a user can't connect to its mailbox, he generally can receive its 
emails through his BlackBerry (it's just to illustrate the fact that it 
doesn't seem to be linked to an account).
When this issue occurs they can't receive the headers from any folder 
(INBOX, custom, sent, ...) and can't read mails that headers are known 
by the mail client.
When there's timeout, users can send emails (same jail and with an error 
when the client want to write a copy in the sentbox) and use the 
calendar server for example (on the same machine).

Dovecot is running on a FreeBSD 7.0 32bits, 4GB RAM, Intel Xeon QuadCore 
@ 1.86 Ghz, and 3*500Gb SATA-2 RAID-5 disks.
The box is hosting jails, and the mail jails ( imap + smtp, clamav + 
spamassassin ). The mail jails are new (since August 2008) but worked 
great since the beginning of this year.
The server is hosting 122 accounts currently.

My first thought was that there's an I/O issue, the disks are maybe too 
busy or there's paging that result in a timeout. I check it through 
vmstat an top commands but nothing appears, there's always some memory 
free (between 90-300 MB) and a very little paging, generally around 1MB. 
The fault are under a hundred and when, few times, it has more than a 
hundred (generally under 200) the next snapshot is under 100. I set the 
screen/line to refresh every 5 seconds.
I shutdowned all jails not directly related to the mail service but the 
problem still occured.
I also separated clamav and spamassassin from imap and smtp to a 
different box.
After that I checked the dovecot config to lighten it, and ( only ) 
disabled fsync.
I upgraded the RAM, added memory is used, but nothing changed.
The human resources are constantly changing here, we were more before 
the problem started than now (around 10-15%), to illustrate the fact 
that I don't think it is linked to the number of users.
I tried to recreate some accounts having the issue, but the problem 
appeared again.
I upgraded Dovecot Friday to 1.1.11 from 1.1.7 (The installation was 
before 1.1.7, I did an upgrade some times ago).

I used a command script to log Thunderbird's IMAP activity, everything 
is fine but there's no timestamp in the logs. So I'm only sure now this 
is not an erroneous packet/info sent problem.
I watched the TB doc ( 
http://www.mozilla.org/projects/nspr/reference/html/prlog.html#25306 ) 
but there's no directive to put a timestamp for each line of log.

The network is fine. I tried different configuration to be sure a device 
isn't doing something weird.

Now, I don't know what to check to identify the issue.
If anyone has any idea I didn't wrote here, or if I did erroneous 
interpretation(s) from the datas, I'll be glad to know.

Regards,

-- 
Bastien Semene
Administrateur Réseau & Système



More information about the dovecot mailing list