replication fails and corrupts index with zlib enabled
Hi, I have two Debian Jessie servers with Dovecot 2.2.13 TCP replication on that have worked fine for years, but now one of them is running low on disk space, so I wanted to try enabling zlib.
I crafted a script following the description given in https://wiki.dovecot.org/Plugins/Zlib and xz'ed some inboxes on the stand-by server, the one with low disk space. So every email in those inboxes is xz'ed but the file name hasn't changed and contains the original size.
This server is on stand-by so most of the email is replicated unidirectionally to it. But administrative emails like cronjobs and monitoring are delivered locally, so it replicates those to the hot server.
The issue appeared when this stand-by server receives such an email and tries to replicate them to the other server.
I'm attaching the full snippet of the log from the hot server, because it throws a longer backtrace. The short version is like this:
dovecot[8438]: imap(redacted@address.org): Error: Cached message size larger than expected (478 > 289) dovecot[8438]: imap(redacted@address.org): Error: Maildir filename has wrong S value, renamed the file from /srv/email/address.org/redacted/Maildir/cur/1533393328.M502775P20341.standby_server,S=478:2, to /srv/email/address.org/redacted/Maildir/cur/1533393328.M502775P20341.standby_server,S=289:2, dovecot[8438]: imap(redacted@address.org): Error: Corrupted index cache file /srv/email/address.org/redacted/Maildir/dovecot.index.cache: Broken physical size for mail UID 45123
After this there's an error and the replication fails. The file is there, it's gzipped and can be zcat'ed but it appears as a blank email on clients.
I've recovered a backup but the issue persists. I also changed from xz to gz because the Debian package docs only mention gz and bzip2, but the issue is the same.
From what I understand and tested, the stand-by server is receiving the email and compressing it but maintaining the original size on the file name. So that's ok, but when the hot server receives the copy, it believes the size is wrong and changes it to the compressed size. Then for some reason the index gets corrupted.
I'm attaching the doveconf for both servers. They're mostly the same, and the only changes introduced were the zlib plugin and its options. Also the script(s) that I used to compress the inboxes.
Am I correct? Is it an issue of replicator not understanding the emails are compressed? I couldn't find anything related to zlib with replication. Maybe it's something fixed in newer versions and I should go that rabbit hole?
Thanks! :)
There has definetly been fixes in this area since 2.2.13 and most likely your issue is fixed.
---Aki TuomiDovecot oy -------- Original message --------From: fauno <fauno@partidopirata.com.ar> Date: 04/08/2018 18:44 (GMT+02:00) To: dovecot@dovecot.org Subject: replication fails and corrupts index with zlib enabled Hi, I have two Debian Jessie servers with Dovecot 2.2.13 TCP replication on that have worked fine for years, but now one of them is running low on disk space, so I wanted to try enabling zlib.
I crafted a script following the description given in https://wiki.dovecot.org/Plugins/Zlib and xz'ed some inboxes on the stand-by server, the one with low disk space. So every email in those inboxes is xz'ed but the file name hasn't changed and contains the original size.
This server is on stand-by so most of the email is replicated unidirectionally to it. But administrative emails like cronjobs and monitoring are delivered locally, so it replicates those to the hot server.
The issue appeared when this stand-by server receives such an email and tries to replicate them to the other server.
I'm attaching the full snippet of the log from the hot server, because it throws a longer backtrace. The short version is like this:
dovecot[8438]: imap(redacted@address.org): Error: Cached message size larger than expected (478 > 289) dovecot[8438]: imap(redacted@address.org): Error: Maildir filename has wrong S value, renamed the file from /srv/email/address.org/redacted/Maildir/cur/1533393328.M502775P20341.standby_server,S=478:2, to /srv/email/address.org/redacted/Maildir/cur/1533393328.M502775P20341.standby_server,S=289:2, dovecot[8438]: imap(redacted@address.org): Error: Corrupted index cache file /srv/email/address.org/redacted/Maildir/dovecot.index.cache: Broken physical size for mail UID 45123
After this there's an error and the replication fails. The file is there, it's gzipped and can be zcat'ed but it appears as a blank email on clients.
I've recovered a backup but the issue persists. I also changed from xz to gz because the Debian package docs only mention gz and bzip2, but the issue is the same.
From what I understand and tested, the stand-by server is receiving the email and compressing it but maintaining the original size on the file name. So that's ok, but when the hot server receives the copy, it believes the size is wrong and changes it to the compressed size. Then for some reason the index gets corrupted.
I'm attaching the doveconf for both servers. They're mostly the same, and the only changes introduced were the zlib plugin and its options. Also the script(s) that I used to compress the inboxes.
Am I correct? Is it an issue of replicator not understanding the emails are compressed? I couldn't find anything related to zlib with replication. Maybe it's something fixed in newer versions and I should go that rabbit hole?
Thanks! :)
participants (2)
-
Aki Tuomi
-
fauno