[Dovecot] corrupted transaction log error resulting in multiple deliveries
I'm finally picking my dovecot project back up now, and after upgrading rc19 to rc27, I sent a test message to an existing account which resulted in log entries like these:
Mar 16 08:45:37 node7 deliver(user@example.com): Corrupted transaction log file /var/indexes/example/com/u/us/user/.INBOX/dovecot.index.log: Append with UID 3, but next_uid = 15368 Mar 16 08:45:37 node7 deliver(user@example.com): msgid=20070316124031.8F089D39A5@node7.neonova.net: save failed to INBOX Mar 16 08:45:37 node7 postfix/pipe[29267]: 8F089D39A5: to=user@example.com, relay=dovecot, delay=306, delays=306/0.01/0/0.02, dsn=4.3.0, status=deferred (temporary failure)
So, as you can see deliver has the problem and defers delivery. Is dovecot supposed to be able to recover from a corrupted log file on its own? It repeated this until I manually wiped out the index directory (didn't try just removing the log files first, sorry). The kicker is that deliver actually did save the message to the inbox every time, so multiple copies were delivered (every time I flushed the queue until it was actually fixed, a copy was delivered). I have my mail directories on an NFS mount, and indexes on localdisk, FYI.
On Fri, 2007-03-16 at 09:01 -0400, Justin McAleer wrote:
Mar 16 08:45:37 node7 deliver(user@example.com): Corrupted transaction log file /var/indexes/example/com/u/us/user/.INBOX/dovecot.index.log: Append with UID 3, but next_uid = 15368 .. So, as you can see deliver has the problem and defers delivery. Is dovecot supposed to be able to recover from a corrupted log file on its own? It repeated this until I manually wiped out the index directory
Yes, it's supposed to recover from that. I couldn't reproduce this. It always fixed the problem with the next try, or 3rd try at latest. If this happens again, could you send me the dovecot.index and dovecot.index.log files?
Timo Sirainen wrote:
Yes, it's supposed to recover from that. I couldn't reproduce this. It always fixed the problem with the next try, or 3rd try at latest. If this happens again, could you send me the dovecot.index and dovecot.index.log files?
While attempting to recreate this I managed to crash imap. Here, I had Thunderbird sitting with the inbox open, I renamed the indexes directory and moved an older copy of the dir into place, and sent a test message. Ironically the message was delivered, but imap went boom:
Mar 16 12:51:15 node7 dovecot: IMAP(justin): Corrupted transaction log file /var/indexes//j/ju/justin/.INBOX/dovecot.index.log: file_seq=1, min_file_offset (63380) > max_file_offset (24) Mar 16 12:51:15 node7 dovecot: IMAP(justin): fscking index file /var/indexes//j/ju/justin/.INBOX/dovecot.index Mar 16 12:51:15 node7 dovecot: IMAP(justin): Corrupted transaction log file /var/indexes//j/ju/justin/.INBOX/dovecot.index.log: invalid indexid (1174063875 != 1174051611) Mar 16 12:51:15 node7 dovecot: IMAP(justin): fscking index file /var/indexes//j/ju/justin/.INBOX/dovecot.index Mar 16 12:51:15 node7 dovecot: IMAP(justin): Corrupted transaction log file /var/indexes//j/ju/justin/.INBOX/dovecot.index.log: invalid new transaction log sequence (2 >= 2) Mar 16 12:51:15 node7 dovecot: IMAP(justin): fscking index file /var/indexes//j/ju/justin/.INBOX/dovecot.index Mar 16 12:51:15 node7 dovecot: IMAP(justin): Unexpected transaction log desync with index /var/indexes//j/ju/justin/.INBOX/dovecot.index Mar 16 12:51:15 node7 dovecot: IMAP(justin): Disconnected: Mailbox is in inconsistent state, please relogin. Mar 16 12:51:15 node7 dovecot: IMAP(justin): file client.c: line 401 (_client_input): assertion failed: (!client->handling_input) Mar 16 12:51:15 node7 dovecot: IMAP(justin): Raw backtrace: imap [0x468dbc] -> imap [0x4686ac] -> imap [0x416971] -> imap [0x41363c] -> imap(cmd_idle+0x108) [0x413858] -> imap [0x416765] -> imap [0x41670a] -> imap(_client_input+0x91) [0x4168c1] -> imap(io_loop_handler_run+0x104) [0x46e874] -> imap(io_loop_run+0x1a) [0x46dc3a] -> imap(main+0x406) [0x41dfb6] -> /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x31c8a1c3fb] -> imap [0x411b3a] Mar 16 12:51:15 node7 dovecot: child 31850 (imap) killed with signal 6
However, I think a more simple way to recreate the problem is to simply delete an existing maildir (with existing indexes) and send a message to it. This shows the behavior every time on my system. I'll attach a tar of all files in the index dir to this email though.
Justin McAleer wrote:
... Mar 16 12:51:15 node7 dovecot: IMAP(justin): Corrupted transaction log file /var/indexes//j/ju/justin/.INBOX/dovecot.index.log: file_seq=1, min_file_offset (63380) > max_file_offset (24) Sorry to self-reply, but I just noticed that the domain part of the path is being left out of those log entries. The path should be /var/indexes/domain/... it's just // in those log entries though. Don't know if that's a logging error or something more significant.
On Fri, 2007-03-16 at 13:14 -0400, Justin McAleer wrote:
Justin McAleer wrote:
... Mar 16 12:51:15 node7 dovecot: IMAP(justin): Corrupted transaction log file /var/indexes//j/ju/justin/.INBOX/dovecot.index.log: file_seq=1, min_file_offset (63380) > max_file_offset (24) Sorry to self-reply, but I just noticed that the domain part of the path is being left out of those log entries. The path should be /var/indexes/domain/... it's just // in those log entries though. Don't know if that's a logging error or something more significant.
It's not a logging error. If there are two justins in different domains they're now using the same index files and that'll cause problems.
Timo Sirainen wrote:
On Fri, 2007-03-16 at 13:14 -0400, Justin McAleer wrote:
Justin McAleer wrote:
... Mar 16 12:51:15 node7 dovecot: IMAP(justin): Corrupted transaction log file /var/indexes//j/ju/justin/.INBOX/dovecot.index.log: file_seq=1, min_file_offset (63380) > max_file_offset (24)
Sorry to self-reply, but I just noticed that the domain part of the path is being left out of those log entries. The path should be /var/indexes/domain/... it's just // in those log entries though. Don't know if that's a logging error or something more significant.
It's not a logging error. If there are two justins in different domains they're now using the same index files and that'll cause problems.
Ahh, I found my problem... I'm not returning the full email address as 'user' in the userdb query the imap daemon is using. I fixed it in the query for deliver, but apparently did not in the other.
Timo Sirainen wrote:
On Fri, 2007-03-16 at 13:14 -0400, Justin McAleer wrote:
Justin McAleer wrote:
... Mar 16 12:51:15 node7 dovecot: IMAP(justin): Corrupted transaction log file /var/indexes//j/ju/justin/.INBOX/dovecot.index.log: file_seq=1, min_file_offset (63380) > max_file_offset (24)
Sorry to self-reply, but I just noticed that the domain part of the path is being left out of those log entries. The path should be /var/indexes/domain/... it's just // in those log entries though. Don't know if that's a logging error or something more significant.
It's not a logging error. If there are two justins in different domains they're now using the same index files and that'll cause problems.
Timo, were you able to recreate the corruption/temporary failure scenario?
On Fri, 2007-03-16 at 13:08 -0400, Justin McAleer wrote:
Mar 16 12:51:15 node7 dovecot: IMAP(justin): file client.c: line 401 (_client_input): assertion failed: (!client->handling_input)
OK, I finally found what causes this. Fixed: http://dovecot.org/list/dovecot-cvs/2007-March/008248.html
However, I think a more simple way to recreate the problem is to simply delete an existing maildir (with existing indexes) and send a message to it. This shows the behavior every time on my system. I'll attach a tar of all files in the index dir to this email though.
Nope, can't reproduce any problems.. Can you try the latest snapshot? Maybe these have been fixed already. http://dovecot.org/nightly/dovecot-latest.tar.gz
Timo Sirainen wrote:
However, I think a more simple way to recreate the problem is to simply delete an existing maildir (with existing indexes) and send a message to it. This shows the behavior every time on my system. I'll attach a tar of all files in the index dir to this email though.
Nope, can't reproduce any problems.. Can you try the latest snapshot? Maybe these have been fixed already. http://dovecot.org/nightly/dovecot-latest.tar.gz
No dice, problem still happens. I'll provide some extra info at the bottom of this, everything I can think of that may be of any interest. I can get you access to this machine if it comes down to that. Just to restate, to test the nightly build I followed this procedure:
Stopped existing dovecot process. Built dovecot with this: ./configure --sysconfdir=/etc/dovecot --with-mysql --with-ioloop=epoll make install start dovecot rm -rf /var/indexes/* rm -rf /var/mailstore/.../Maildir (for the test account) send test message rm -rf /var/mailstore/.../Maildir send test message (corruption error, temporary failure to postfix, but message delivered)
Result is the same every time I repeat that process. So, I already gave my configure options (tried without specifying an ioloop, resulting in poll being used instead, with no effect). Here is the summary after configure runs:
Install prefix ...................... : /usr/local File offsets ........................ : 64bit I/O loop method ..................... : epoll File change notification method ..... : dnotify Building with SSL support ........... : yes (OpenSSL) Building with IPv6 support .......... : yes Building with pop3 server ........... : yes Building with mail delivery agent .. : yes Building with GSSAPI support ........ : no Building with user database modules . : static prefetch passwd passwd-file checkpassword sql (modules) Building with password lookup modules : passwd passwd-file shadow checkpassword sql (modules) Building with SQL drivers ............: mysql
The machine is an AMD 64 running CentOS 4.4.
And finally, dovecot -n:
# /etc/dovecot/dovecot.conf base_dir: /var/run/dovecot/ protocols: imap pop3 ssl_disable: yes disable_plaintext_auth: no login_dir: /var/run/dovecot/login login_executable(default): /usr/local/libexec/dovecot/imap-login login_executable(imap): /usr/local/libexec/dovecot/imap-login login_executable(pop3): /usr/local/libexec/dovecot/pop3-login login_user: dovecot-login login_log_format_elements: user=<%u> domain=<%d> method=%m rip=%r lip=%l %c login_process_size: 64 login_processes_count: 25 mail_location: maildir:%h/Maildir:INDEX=/var/indexes/%d/%1n/%2n/%n mail_cache_min_mail_count: 100 maildir_copy_with_hardlinks: yes maildir_copy_preserve_filename: yes mail_executable(default): /usr/local/libexec/dovecot/imap mail_executable(imap): /usr/local/libexec/dovecot/imap mail_executable(pop3): /usr/local/libexec/dovecot/pop3 mail_plugins(default): quota imap_quota mail_plugins(imap): quota imap_quota mail_plugins(pop3): quota mail_plugin_dir(default): /usr/local/lib/dovecot/imap mail_plugin_dir(imap): /usr/local/lib/dovecot/imap mail_plugin_dir(pop3): /usr/local/lib/dovecot/pop3 imap_client_workarounds(default): outlook-idle delay-newmail imap_client_workarounds(imap): outlook-idle delay-newmail imap_client_workarounds(pop3): outlook-idle pop3_uidl_format(default): pop3_uidl_format(imap): pop3_uidl_format(pop3): %08Xu%08Xv pop3_client_workarounds(default): pop3_client_workarounds(imap): pop3_client_workarounds(pop3): outlook-no-nuls auth default: mechanisms: plain digest-md5 cram-md5 apop cache_size: 16384 cache_ttl: 300 user: dovecot-auth username_translation: %@ passdb: driver: sql args: /etc/dovecot/email-login.conf passdb: driver: sql args: /etc/dovecot/userid-login.conf passdb: driver: sql args: /etc/dovecot/alias-login.conf passdb: driver: sql args: /etc/dovecot/domalias-login.conf userdb: driver: prefetch userdb: driver: sql args: /etc/dovecot/email-sql.conf socket: type: listen client: master: path: /var/run/dovecot/auth-master mode: 432 user: root group: mail
On Mon, 2007-03-19 at 10:36 -0400, Justin McAleer wrote:
rm -rf /var/indexes/* rm -rf /var/mailstore/.../Maildir (for the test account) send test message rm -rf /var/mailstore/.../Maildir send test message (corruption error, temporary failure to postfix, but message delivered)
This is what happens with me:
deliver(tss): Mar 21 21:03:59 Error: Corrupted transaction log file /tmp/index/.INBOX/dovecot.index.log: Append with UID 1, but next_uid = 2
The next delivery says:
deliver(tss): Mar 21 21:04:05 Error: Transaction log file /tmp/index/.INBOX/dovecot.index.log: marked corrupted deliver(tss): Mar 21 21:04:05 Error: Transaction log file /tmp/index/.INBOX/dovecot.index.log.2: marked corrupted
But doesn't give temporary failure. And the 3rd and so on work without any errors at all.
Or is this what you meant? I thought it never stopped giving errors with you.
Anyway, this should fix it completely: http://dovecot.org/list/dovecot-cvs/2007-March/008261.html
Timo Sirainen wrote:
On Mon, 2007-03-19 at 10:36 -0400, Justin McAleer wrote:
rm -rf /var/indexes/* rm -rf /var/mailstore/.../Maildir (for the test account) send test message rm -rf /var/mailstore/.../Maildir send test message (corruption error, temporary failure to postfix, but message delivered)
This is what happens with me:
deliver(tss): Mar 21 21:03:59 Error: Corrupted transaction log file /tmp/index/.INBOX/dovecot.index.log: Append with UID 1, but next_uid = 2
The next delivery says:
deliver(tss): Mar 21 21:04:05 Error: Transaction log file /tmp/index/.INBOX/dovecot.index.log: marked corrupted deliver(tss): Mar 21 21:04:05 Error: Transaction log file /tmp/index/.INBOX/dovecot.index.log.2: marked corrupted
But doesn't give temporary failure. And the 3rd and so on work without any errors at all.
Or is this what you meant? I thought it never stopped giving errors with you.
Anyway, this should fix it completely: http://dovecot.org/list/dovecot-cvs/2007-March/008261.html
I just pulled the latest nightly, and now it doesn't even give the transaction log error. So, looks like this did the trick.
Just to try to clarify what I was seeing though, I wasn't sending multiple test messages (after failure began). It gave the transaction log error, delivered the message, and postfix deferred delivery due to the tempfail. So, the same test message was getting delivered to the maildir every time postfix attempted redelivery. I guess I never went more than 3 delivery attempts, but all three attempts, at the same message delivery, resulted in dovecot errors, delivery to inbox, and tempfail returned to postfix.
But like I said, not a peep out of dovecot with the latest nightly.
participants (2)
-
Justin McAleer
-
Timo Sirainen