[Dovecot] quick question

David Halik dhalik at jla.rutgers.edu
Fri Jan 22 18:24:52 EET 2010


Timo (and anyone else who feels like chiming in),

I was just wondering if you'd be able to tell me if the amount of 
corruption I see on a daily basis is what you consider "average" for our 
current setup and traffic. Now that we are no longer experiencing any 
core dumps with the latest patches since our migration from courier two 
months ago, I'd like to know what is expected as operational norms. 
Prior to this we had never used Dovecot, so I have nothing to go on.

Our physical setup is 10 Centos 5.4 x86_64 IMAP/POP servers, all with 
the same NFS backend where the index, control, and Maildir's for the 
users reside. Accessing this are direct connections from clients, plus 
multiple squirrelmail webservers, and pine users, all at the same time 
with layer4 switch connection load balancing.

Each server has an average of about 400 connections, for a total of 
around concurrent 4000 during a normal business day. This is out of a 
possible user population of about 15,000.

All our dovecot servers syslog to one machine, and on average I see 
about 50-75 instances of file corruption per day. I'm not counting each 
line, since some instances of corruption generate a log message for each 
uid that's wrong. This is just me counting "user A was corrupted once at 
10:00, user B was corrupted at 10:25" for example.

Examples of the corruption are as follows:

###########
Corrupted transaction log file ..../dovecot/.INBOX/dovecot.index.log seq 
28: Invalid transaction log size (32692 vs 32800): 
...../dovecot/.INBOX/dovecot.index.log (sync_offset=32692)

Corrupted index cache file ...../dovecot/.Sent 
Messages/dovecot.index.cache: Corrupted physical size for uid=624: 0 != 
53490263

Corrupted transaction log file ..../dovecot/.INBOX/dovecot.index.log seq 
66: Unexpected garbage at EOF (sync_offset=21608)

Corrupted transaction log file 
...../dovecot/.Trash.RFA/dovecot.index.log seq 2: indexid changed 
1264098644 -> 1264098664 (sync_offset=0)

Corrupted index cache file ...../dovecot/.INBOX/dovecot.index.cache: 
invalid record size

Corrupted index cache file ...../dovecot/.INBOX/dovecot.index.cache: 
field index too large (33 >= 19)

Corrupted transaction log file ..../dovecot/.INBOX/dovecot.index.log seq 
40: record size too small (type=0x0, offset=5788, size=0) (sync_offset=5812)
##########

These are most of the unique messages I could find, although the 
majority are the same as the first two I posted. So, my question, is 
this normal for a setup such as ours? I've been arguing with my boss 
over this since the switch. My opinion is that with a setup such as ours 
where a user can be logged in using Thunderbird, Squirrelmail, and their 
Blackberry all concurrently at the same time, there will always be the 
occasional index/log corruption.

Unfortunately, he is of the opinion that there should rarely be any and 
there is a design flaw in how Dovecot is designed to work with multiple 
services with an NFS backend.

What has been your experience so far?

Thanks,
-Dave

-- 
================================
David Halik
System Administrator
OIT-CSS Rutgers University
dhalik at jla.rutgers.edu
================================



More information about the dovecot mailing list