[Dovecot] some errors on 1.1.3
Hi,
we recently upgraded dovecot from 1.0.5 to 1.1.3+latest patches. After the upgrade I saw some errors in the logs and I'm not sure whether they're a cause for concern. Our setup is 2 servers running openSUSE 11.0 and a NetApp Filer as storage backend. Following is the config:
protocols: imap imaps pop3 pop3s login_dir: /var/run/dovecot/login login_executable(default): /usr/lib64/dovecot/imap-login login_executable(imap): /usr/lib64/dovecot/imap-login login_executable(pop3): /usr/lib64/dovecot/pop3-login login_max_processes_count: 512 max_mail_processes: 1024 verbose_proctitle: yes first_valid_uid: 100 mail_location: maildir:~/.Maildir:CONTROL=/var/mailctl/%n:INDEX=/var/mailctl/%n mail_nfs_storage: yes mail_nfs_index: yes mail_executable(default): /usr/lib64/dovecot/imap mail_executable(imap): /usr/lib64/dovecot/imap mail_executable(pop3): /usr/lib64/dovecot/pop3 mail_process_size: 512 mail_plugins(default): quota imap_quota fts fts_squat imap_alert mail_plugins(imap): quota imap_quota fts fts_squat imap_alert mail_plugins(pop3): quota mail_plugin_dir(default): /usr/lib64/dovecot/modules/imap mail_plugin_dir(imap): /usr/lib64/dovecot/modules/imap mail_plugin_dir(pop3): /usr/lib64/dovecot/modules/pop3 auth default: mechanisms: plain login gssapi master_user_separator: * passdb: driver: passwd-file args: /etc/dovecot/passwd.masterusers master: yes passdb: driver: pam args: dovecot userdb: driver: ldap args: /etc/dovecot/dovecot-ldap.conf socket: type: listen master: path: /var/run/dovecot/auth-master mode: 384 plugin: quota: fs fts: squat fts_squat: partial=4 full=4
The errors we're seeing are:
Oct 1 07:39:27 host1 dovecot: IMAP(user1): Invalid transaction log size (83856 vs 83160): /var/mailctl/corey/.INBOX/dovecot.index.log
Oct 1 09:14:55 host1 dovecot: IMAP(user2): Broken file /var/mailctl/user2/.INBOX/dovecot-uidlist line 15299: UIDs not ordered (206464 > 206464)
Oct 1 09:16:43 host1 dovecot: IMAP(user3): Broken file /var/mailctl/user3/.Trash/dovecot-uidlist line 22: UID larger than next_uid (153866 >= 153866)
Oct 1 06:03:19 host1 deliver(user2): Corrupted transaction log file /var/mailctl/user2/.INBOX/dovecot.index.log: record size too small (type=0x0, offset=71940, size=0)
Oct 1 06:03:19 host1 deliver(user2): Transaction log file /var/mailctl/user2/.INBOX/dovecot.index.log: marked corrupted
Oct 1 06:59:45 host1 deliver(user1): Corrupted transaction log file /var/mailctl/user1/.INBOX/dovecot.index.log: Unexpected garbage at EOF
Oct 1 07:29:05 host1 dovecot: IMAP(user4): nfs_flush_fcntl: fcntl(/var/mailctl/user4/.INBOX/dovecot.index.cache, F_RDLCK) failed: Interrupted system call
It only seems to affect a handful of users and they didn't complain so I assume they didn't see any negative effect on the client side. But it would be nice if somebody could explain shortly whether these errors are serious and we should be concerned or if they are harmless.
Karsten.
It's just a jump to the left And then a step to the right. Put your hands on your hips And pull your knees in tight. It's the pelvic thrust That really gets you insa-a-a-a-ane
LET'S DO THE TIME WARP AGAIN!
-- Rocky Horror Picture Show
On Wed, 2008-10-01 at 21:15 -0400, Karsten Künne wrote:
we recently upgraded dovecot from 1.0.5 to 1.1.3+latest patches. After the upgrade I saw some errors in the logs and I'm not sure whether they're a cause for concern. Our setup is 2 servers running openSUSE 11.0 and a NetApp Filer as storage backend. Following is the config:
So it's possible that two users are accessing the same mailbox from two different servers at the same time?
It was working the same way with v1.0 too? Without errors? That sounds strange, since that kind of a setup should have been much more broken with v1.0.
mail_location: maildir:~/.Maildir:CONTROL=/var/mailctl/%n:INDEX=/var/mailctl/%n
I guess /var/mailctl is also on netapp?
mail_nfs_storage: yes mail_nfs_index: yes
These should have prevented these errors, but I guess they don't fully work everywhere then..
Oct 1 07:39:27 host1 dovecot: IMAP(user1): Invalid transaction log size (83856 vs 83160): /var/mailctl/corey/.INBOX/dovecot.index.log ..
None of these should ever happen.
Oct 1 09:14:55 host1 dovecot: IMAP(user2): Broken file /var/mailctl/user2/.INBOX/dovecot-uidlist line 15299: UIDs not ordered (206464 > 206464)
Oct 1 09:16:43 host1 dovecot: IMAP(user3): Broken file /var/mailctl/user3/.Trash/dovecot-uidlist line 22: UID larger than next_uid (153866 >= 153866)
These especially could cause visible problems to clients.
Oct 1 07:29:05 host1 dovecot: IMAP(user4): nfs_flush_fcntl: fcntl(/var/mailctl/user4/.INBOX/dovecot.index.cache, F_RDLCK) failed: Interrupted system call
Do you get a lot of these? This means it was trying to flush NFS read cache by read-locking the file, but it failed after waiting for 60 seconds. All of these locks should be really short, so if it's timing out things aren't working very well.
On Sunday 05 October 2008 07:16:23 Timo Sirainen imposed structure on a stream of electrons, yielding:
On Wed, 2008-10-01 at 21:15 -0400, Karsten Künne wrote:
we recently upgraded dovecot from 1.0.5 to 1.1.3+latest patches. After the upgrade I saw some errors in the logs and I'm not sure whether they're a cause for concern. Our setup is 2 servers running openSUSE 11.0 and a NetApp Filer as storage backend. Following is the config:
So it's possible that two users are accessing the same mailbox from two different servers at the same time?
It's not impossible but it should be very rare. Users connect to a generic server name which resolves to two different IP numbers via round-robin DNS. It's not perfect but it has worked well for us in the past.
It was working the same way with v1.0 too? Without errors? That sounds strange, since that kind of a setup should have been much more broken with v1.0.
Well, I didn't really look for errors with 1.0 if no user complained so I can't say for sure that there were no errors. At least, no user complained :-) , and that's the most important. I haven't gotten any real complaints with 1.1 either but I'm looking a bit more now because of the new version.
mail_location: maildir:~/.Maildir:CONTROL=/var/mailctl/%n:INDEX=/var/mailctl/%n
I guess /var/mailctl is also on netapp?
Yes, but it has no quotas set.
mail_nfs_storage: yes mail_nfs_index: yes
These should have prevented these errors, but I guess they don't fully work everywhere then..
Oct 1 07:39:27 host1 dovecot: IMAP(user1): Invalid transaction log size (83856 vs 83160): /var/mailctl/corey/.INBOX/dovecot.index.log
..
None of these should ever happen.
After further examination the sequence of errors seems to be like this:
First:
Oct 5 23:10:14 host1 deliver(usera): Corrupted transaction log file /var/mailctl/usera/.INBOX/dovecot.index.log: Unexpected garbage at EOF
Second:
Oct 5 23:10:14 host1 deliver(usera): Fixed index file /var/mailctl/usera/.INBOX/dovecot.index: log_file_seq 222 -> 223
Third:
Oct 5 23:28:07 host1 deliver(usera): Transaction log file /var/mailctl/usera/.INBOX/dovecot.index.log: marked corrupted
Fourth:
Oct 5 23:28:07 host1 dovecot: IMAP(usera): Invalid transaction log size (608 vs 240): /var/mailctl/usera/.INBOX/dovecot.index.log Oct 5 23:28:07 host1 dovecot: IMAP(usera): Invalid transaction log size (608 vs 240): /var/mailctl/usera/.INBOX/dovecot.index.log ....
and so on.
I observed this pattern at least 3 times. It IS possible that the index files were already corrupted by 1.0.5. Hard to tell now.
Oct 1 09:14:55 host1 dovecot: IMAP(user2): Broken file /var/mailctl/user2/.INBOX/dovecot-uidlist line 15299: UIDs not ordered (206464 > 206464)
Oct 1 09:16:43 host1 dovecot: IMAP(user3): Broken file /var/mailctl/user3/.Trash/dovecot-uidlist line 22: UID larger than next_uid (153866 >= 153866)
These especially could cause visible problems to clients.
That's not good news. It probably doesn't matter for the Trash folder but the INBOX is definitely more critical. So far, I haven't heard any complaints.
After we upgraded our second pair of mailservers I believe I found an explanation for at least some of these errors. We did all of the upgrades by upgrading just one of the servers and then the second one the following day. So, for one day we had a mixture of 1.1.3 and 1.0.5 running, accessing the same mailstore and the same control and index files. I guess that wasn't such a great setup and could have lead to some corruption.
Oct 1 07:29:05 host1 dovecot: IMAP(user4): nfs_flush_fcntl: fcntl(/var/mailctl/user4/.INBOX/dovecot.index.cache, F_RDLCK) failed: Interrupted system call
Do you get a lot of these? This means it was trying to flush NFS read cache by read-locking the file, but it failed after waiting for 60 seconds. All of these locks should be really short, so if it's timing out things aren't working very well.
Fortunately, these errors disappeared in the last 3 days. I'll keep an eye out for them and I'll also check the NetApp logs.
Karsten.
Paul's Law: In America, it's not how much an item costs, it's how much you save.
participants (2)
-
Karsten Künne
-
Timo Sirainen