On Mon, Oct 15, 2018 at 08:50:21AM +0000, Raymond Sellars wrote:
Hi
Looking for some insight into mdbox index file management and recovery from corruptions.
I have a two node cluster on NFS with proxy director in front for user stickness. One node (a nominated master) bidirectionally replicates to a 3rd node on a DR site.
We periodically get index file corruptions resulting in rebuilds. However the user experience is poor as messages read/deleted from months/years ago all reappear as unread again.
We've seen corruption because of NFS NTP time sync problems, proxy not being stick, but also the DR node being off line for a while and then tripping corruption within production when it comes back on.
Error message example (1 of):
Error: Corrupted dbox file /mailshare/.. (removed) ../home/mail/storage/m.4 (around offset=993548): EOF reading msg header (got 0/30 bytes) https://wiki2.dovecot.org/MailboxFormat/dbox- i've read up on all the documentation I can find and understand " you must not lose the dbox index files, they can't be regenerated without data loss."
Questions: #1 Any additional tips for avoiding mdbox index corruptions with dsync? Or should I revert to maildir format? I like the performance premise of the mdbox but these index corruptions are a reliability issue.
#2 I'm guessing read status is one of the meta data items lost. But its seems it can't recover it from dovecot.index.backup files either. Any technique to preserve that item as its key to the user experience?
#3 If index/transaction logs are so critical is there some kind of check point backups I can take? Native dovecot feature or do I need to script something.
#4 I've noticed that rebuilding the index does not work if the dovecot.index.log file is lost (deleted as a hard test). The dovecot.index.cache can be but once the log file i gone messages are not automatically (or manually that i can find) recovered from the storage directory.
I've not seen any dovecot.index.log file corruptions but that file seems very high risk. If rebuilding the index only from the log file or a combination process from storage directory?
Is there perhaps an option to just use the transaction log and not the index? Although that doesn't sound wise for performance.
#5 In additional to status UNREAD we also notice files moved to the trash reappear. Is that expected behavior?
Thanks Raymond
What version of dovecot and what OS are you running? Is NFS linux/bsd/netapp/etc?