My mail server consumed all of its configured file table slots and
started returning ENFILE ("Too many open files in system") for many
operations. This wreaked havoc with dovecot-1.1.13. Here are some
areas where dovecot should detect and more gracefully handle ENFILE
error returns:
Deliver should report an error more helpful than "Unknown internal
error" when fdatasync_path fails: deliver(user): Apr 10 14:53:20 Error: fdatasync_path(/Volumes/MailData/ user/new) failed: Too many open files in system deliver(user): Apr 10 14:53:20 Info: msgid=<...>: save failed to
INBOX: BUG: Unknown internal errorIndex files should not be fscked when they could not be opened due
to ENFILE: Error: IMAP(user): open() failed with index file /Volumes/MailData/ user/.Deleted Messages/dovecot.index: Too many open files in system Warning: IMAP(user): fscking index file /Volumes/MailData/ user/.Deleted Messages/dovecot.indexDovecot should not crash when mail transaction logs become
corrupt, or appear to have become corrupt due to ENFILE. Here is a
sequence of error messages from the log but I can't tell whether the
IMAP ones up through the panic in the middle are all from the same
process, or from different processes. deliver(user): Apr 10 14:55:32 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 14:55:54 Error: open() failed with index file / Volumes/MailData/user/dovecot.index: Too many open files in system deliver(user): Apr 10 14:55:54 Error: open() failed with file /Volumes/ MailData/user/dovecot.index.log.2: Too many open files in system deliver(user): Apr 10 14:55:57 Error: open() failed with index cache
file /Volumes/MailData/user/dovecot.index.cache: Too many open files
in system dovecot: Apr 10 14:56:08 Warning: IMAP(user): fscking index file / Volumes/MailData/user/dovecot.index dovecot: Apr 10 14:56:16 Error: IMAP(user): Fixed index file /Volumes/ MailData/user/dovecot.index: log_file_seq 15 -> 2 deliver(user): Apr 10 14:56:19 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 14:56:20 Error: Corrupted transaction log file / Volumes/MailData/user/dovecot.index.log seq 2: record size too small
(type=0x0, offset=23088, size=0) (sync_offset=36848) deliver(user): Apr 10 14:56:20 Warning: fscking index file (in-memory
index) deliver(user): Apr 10 14:56:21 Info: msgid=<...>: save failed to
INBOX: Internal error occurred. Refer to server log for more
information. [2009-04-10 14:56:20] dovecot: Apr 10 14:56:40 Panic: IMAP(user): file mail-transaction-log- view.c: line 103 (mail_transaction_log_view_set): assertion failed:
(min_file_seq <= max_file_seq) dovecot: Apr 10 14:56:40 Error: IMAP(user): Raw backtrace: [see below
for better backtrace] deliver(user): Apr 10 14:56:45 Error: Transaction log file /Volumes/ MailData/user/dovecot.index.log: marked corrupted deliver(user): Apr 10 14:57:04 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 14:57:23 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 14:57:49 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 14:58:21 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 14:58:41 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 14:59:09 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 14:59:28 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 14:59:43 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 15:00:57 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 15:03:01 Info: msgid=<...>: saved mail to INBOX deliver(user): Apr 10 15:03:27 Info: msgid=<...>: saved mail to INBOX dovecot: Apr 10 15:21:41 Error: IMAP(user): Corrupted transaction log
file /Volumes/MailData/user/dovecot.index.log seq 2: Invalid
transaction log size (36848 vs 23088): /Volumes/MailData/user/ dovecot.index.log (sync_offset=36848) dovecot: Apr 10 15:21:42 Info: imap-login: Login: user=<user>,
method=CRAM-MD5, rip=..., lip=..., TLS
The backtrace for the assertion failure was:
0 libSystem.B.dylib 0x00007fff8007c102 __kill + 10
1 libSystem.B.dylib 0x00007fff800f7ffc abort + 83
2 imap 0x000000010006a22e
default_fatal_finish + 70
3 imap 0x000000010006a27c
i_syslog_fatal_handler + 0
4 imap 0x00000001000699db i_info + 0
5 imap 0x0000000100058542
mail_transaction_log_view_set + 104
6 imap 0x0000000100053b4c
view_sync_set_log_view_range + 125
7 imap 0x0000000100053fc5
mail_index_view_sync_begin + 378
8 imap 0x000000010003c26e
index_mailbox_sync_init + 129
9 imap 0x0000000100018457
maildir_storage_sync_init + 229
10 imap 0x000000010000c6bb imap_sync_init +
132
11 imap 0x000000010000c90d cmd_sync_delayed
- 477
12 imap 0x000000010000716a
client_handle_input + 409 13 imap 0x0000000100007244 client_input + 148 14 imap 0x0000000100071228
io_loop_handler_run + 272 15 imap 0x00000001000705ed io_loop_run + 55 16 imap 0x000000010000e219 main + 1635 17 imap 0x00000001000016c0 start + 52
I have since configured more open files, and enabled core dumps and
added the pid to the mail_log_prefix for better tracking, but this is
all I have to work with for now.
Too bad the values of min_file_seq and max_file_seq are not logged in
the panic.
On Mon, 2009-04-13 at 14:29 -0500, Mike Abbott wrote:
My mail server consumed all of its configured file table slots and
started returning ENFILE ("Too many open files in system") for many
operations. This wreaked havoc with dovecot-1.1.13. Here are some
areas where dovecot should detect and more gracefully handle ENFILE
error returns:
I also just tried making nfs_safe_open() fail randomly and reproducing these.
- Deliver should report an error more helpful than "Unknown internal
error" when fdatasync_path fails: deliver(user): Apr 10 14:53:20 Error: fdatasync_path(/Volumes/MailData/ user/new) failed: Too many open files in system deliver(user): Apr 10 14:53:20 Info: msgid=<...>: save failed to
INBOX: BUG: Unknown internal error
http://hg.dovecot.org/dovecot-1.1/rev/576e77047959
- Index files should not be fscked when they could not be opened due
to ENFILE: ..- Dovecot should not crash when mail transaction logs become
corrupt, or appear to have become corrupt due to ENFILE.
I made nfs_safe_open() return failures randomly and found all kinds of bugs with it. I committed several fixes to hg, but I still get some weird errors once in a while, I'll try to debug more later..
participants (2)
-
Mike Abbott
-
Timo Sirainen