bug report: "Corrupted transaction log file" PLUS request for improvement of the documentation about mmap_disable
Hello Dovecot developers, hello list,
since more than two years we had occasional log messages like this:
Error: Mailbox INBOX: Corrupted transaction log file /IMAP/mail/mailboxes/USER/mdbox/mailboxes/INBOX/dbox-Mails/dovecot.index.log seq 45532: Unexpected garbage at EOF (sync_offset=4856)
Mostly I had to repair the affected mailbox manually after such events. Luckily I could prevent any data corruptions by using the healthy mailbox from the replication partner.
Those errors occurred rather seldom (once a month) but started get more often. At the end of last year, they occurred about once a week for 800 users with up to 200 folders. The reporting process was always different, like the indexer, imapd, lmtpd, replicator or doveadm (on cmdline). It was obviously some sort of race-condition between the different processes accessing the index. I did not see any hints how to reproduce it. At the beginning of this year I started to debug this issue by adding debugging log output to our productive system, mostly in lib-index. I was hoping to find the cause of the issue and maybe even provide a patch to fix this. But about 5 weeks ago, I discovered, that the error can be simply avoided by setting mmap_disable=yes. Since this change, the error did not occur again even with specific tests, that previously likely triggered the bug. That's where I stopped to debug further.
We are running Dovecot in master-master mode using replication plugin behind a Dovecot director setup. The storage on the backends is provided by local disks on Linux with newest ZFS filesystem. We use mdbox storage and FTS. While testing I also checked ext4 instead of ZFS and different FTS backends without success.
I can provide more information for the developers if you are interested!
About mmap_disable: The documentation only mentions, that you should set this to "yes" for SHARED filesystems (I don't think local ZFS or ext4 qualify for that). https://doc.dovecot.org/2.3/settings/core/#core_setting-mmap_disable
On another page (https://doc.dovecot.org/2.3/admin_manual/mailbox_formats/#memory-mapping), it is mentioned, that "If mmap() is supported by your filesystem, it’s still not certain that it gives better performance. Try benchmarking to make sure."
I also found an old mail from Timo (https://dovecot.org/list/dovecot/2011-December/079975.html) which lists 3 cases, where mmap_disable=yes is required or at least suggested. In only one case, he wrote "With local filesystems mmap_disable=no _should_ be faster."
I did not do extensive benchmarks but I did not see any performance issues since we disabled mmap on our IMAP backends.
So, I wonder, if it would not be better to switch the default for mmap_disable to YES:
- There are configurations, where the current default might cause data corruption!
- There are configurations, where disabling mmap is suggested.
- Even on local storage, the performance benefits are unsure.
- According to my observations, even on local storage, enabling mmap can cause data corruption.
Changing the default to NO would protect users from accidental data corruption. Users that aim for maximum performance can still enable mmap and test if it improves performance. IMHO the documentation should mention, that even for local disks, mmap CAN cause data corruption (until the above mentioned bug is found and fixed).
Best regards,
Patrick Cernko <pcernko@mpi-klsb.mpg.de> +49 681 9325 5815 Joint Scientific IT and Technical Service Max-Planck-Institute für Informatik & Softwaresysteme
participants (1)
-
Patrick Cernko