[Dovecot] dovecot 2.2.0 corrupts mailboxes?

newer
[Dovecot] Quota not working with...

older
[Dovecot] [PATCH] example-config:...

manu＠netbsd.org

5 May 2013 5 May '13

3:56 a.m.

On april 17th, I upgraded from dovecot 2.1.13 to 2.2.0. Since that time, I had two different users that reported received three incident of messages that disapeared from their mailboxes.

The mailbox format is mbox on local FFS filesystem (no NFS), and I use filesystem quotas (but both users are far from filling their quotas). When the message disapeared, it was always a whole rand of dates. On the last incident reported, the user also saw some message being duplicated many times.

There is something interesting in the logs:

May 4 20:16:30 volanges dovecot: imap(jdoe): Error: Cached message size smaller than expected (2000 < 8063) May 4 20:16:30 volanges dovecot: imap(jdoe): Error: Corrupted index cache file /mail/indexes/jdoe/.imap/INBOX/dovecot.index.cache: Broken physical size for mail UID 141869 May 4 20:19:48 volanges dovecot: imap(jdoe): Error: Cached message size smaller than expected (9711 < 16248) May 4 20:19:48 volanges dovecot: imap(jdoe): Error: Corrupted index cache file /mail/indexes/jdoe/mail/.imap/Arxiv/dovecot.index.cache: Broken physical size for mail UID 4383 May 4 21:14:35 volanges dovecot: imap(jdoe): Error: Cached message size smaller than expected (1878 < 8066) May 4 21:14:35 volanges dovecot: imap(jdoe): Error: Corrupted index cache file /mail/indexes/jdoe/mail/.imap/CNRS/dovecot.index.cache: Broken physical size for mail UID 290 May 4 21:15:17 volanges dovecot: imap(jdoe): Error: Cached message size smaller than expected (17285 < 24440) May 4 21:15:17 volanges dovecot: imap(jdoe): Error: Corrupted index cache file /mail/indexes/jdoe/mail/.imap/Commandes/dovecot.index.cache: Broken physical size for mail UID 680

Does that ring a bell? I am tempted to downgrade to 2.1.13. Does it makes sense? Is it safe to do so?

-- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@netbsd.org

Show replies by date

Morten Stevens

6 May 6 May

5:20 p.m.

On 05.05.2013 02:56, Emmanuel Dreyfus wrote:

...

May 4 20:16:30 volanges dovecot: imap(jdoe): Error: Cached message size smaller than expected (2000 < 8063) May 4 20:16:30 volanges dovecot: imap(jdoe): Error: Corrupted index cache file /mail/indexes/jdoe/.imap/INBOX/dovecot.index.cache: Broken physical size for mail UID 141869 May 4 20:19:48 volanges dovecot: imap(jdoe): Error: Cached message size smaller than expected (9711 < 16248) May 4 20:19:48 volanges dovecot: imap(jdoe): Error: Corrupted index cache file /mail/indexes/jdoe/mail/.imap/Arxiv/dovecot.index.cache: Broken physical size for mail UID 4383 May 4 21:14:35 volanges dovecot: imap(jdoe): Error: Cached message size smaller than expected (1878 < 8066) May 4 21:14:35 volanges dovecot: imap(jdoe): Error: Corrupted index cache file /mail/indexes/jdoe/mail/.imap/CNRS/dovecot.index.cache: Broken physical size for mail UID 290 May 4 21:15:17 volanges dovecot: imap(jdoe): Error: Cached message size smaller than expected (17285 < 24440) May 4 21:15:17 volanges dovecot: imap(jdoe): Error: Corrupted index cache file /mail/indexes/jdoe/mail/.imap/Commandes/dovecot.index.cache: Broken physical size for mail UID 680

Does that ring a bell? I am tempted to downgrade to 2.1.13. Does it makes sense? Is it safe to do so?

This bug has been fixed with dovecot 2.1.14.

Please check: http://hg.dovecot.org/dovecot-2.1/rev/0b0399f1b6aa http://dovecot.org/list/dovecot/2013-February/088313.html

Best regards,

Morten

Emmanuel Dreyfus

6:55 p.m.

On Mon, May 06, 2013 at 04:20:52PM +0200, Morten Stevens wrote:

...

...
May 4 21:15:17 volanges dovecot: imap(jdoe): Error: Corrupted index cache file /mail/indexes/jdoe/mail/.imap/Commandes/dovecot.index.cache: Broken physical size for mail UID 680

Does that ring a bell? I am tempted to downgrade to 2.1.13. Does it makes sense? Is it safe to do so?

This bug has been fixed with dovecot 2.1.14.

But I am running 2.2.0 ...

-- Emmanuel Dreyfus manu@netbsd.org

Oscar del Rio

8:52 p.m.

On 05/ 6/13 11:55 AM, Emmanuel Dreyfus wrote:

...

On Mon, May 06, 2013 at 04:20:52PM +0200, Morten Stevens wrote:

...
...
May 4 21:15:17 volanges dovecot: imap(jdoe): Error: Corrupted index cache file /mail/indexes/jdoe/mail/.imap/Commandes/dovecot.index.cache: Broken physical size for mail UID 680

Does that ring a bell? I am tempted to downgrade to 2.1.13. Does it makes sense? Is it safe to do so? This bug has been fixed with dovecot 2.1.14. But I am running 2.2.0 ...

Have you tried 2.2.1?

Emmanuel Dreyfus

7 May 7 May

10:16 a.m.

On Mon, May 06, 2013 at 01:52:55PM -0400, Oscar del Rio wrote:

...

Have you tried 2.2.1?

Will do, but since the problem cannot be reliabily reproduced, I have no way of knowing it is fixed. Is there anything in 2.2.1 changelog that hints it could be fixed?

-- Emmanuel Dreyfus manu@netbsd.org

Timo Sirainen

15 May 15 May

2:50 p.m.

On 5.5.2013, at 3.56, Emmanuel Dreyfus <manu@netbsd.org> wrote:

...

On april 17th, I upgraded from dovecot 2.1.13 to 2.2.0. Since that time, I had two different users that reported received three incident of messages that disapeared from their mailboxes.

The mailbox format is mbox on local FFS filesystem (no NFS), and I use filesystem quotas (but both users are far from filling their quotas). When the message disapeared, it was always a whole rand of dates. On the last incident reported, the user also saw some message being duplicated many times.

There are some locking code changes between v2.1 and v2.2, which I guess might be buggy. But I can't reproduce any corruption with stress testing. What's your doveconf -n output? Are you delivering mails via dovecot-lda or something external?

...

Does that ring a bell? I am tempted to downgrade to 2.1.13. Does it makes sense? Is it safe to do so?

When downgrading, I recommend latest v2.1.

Emmanuel Dreyfus

8:33 p.m.

On Wed, May 15, 2013 at 02:50:55PM +0300, Timo Sirainen wrote:

...

There are some locking code changes between v2.1 and v2.2, which I guess might be buggy. But I can't reproduce any corruption with stress testing. What's your doveconf -n output? Are you delivering mails via dovecot-lda or something external?

dovecot -n is below. dovecot takes care of delivery, through LMTP.

Additionnal thoughts on possible problems:

one of the users was using mutt locally and accessed its mailbox directly without going through dovecot.
I experimented dsync replication from another machine that was not accessible through POP/IMAP/SMTP, perhaps this is what caused chaos?

auth_mechanisms = plain login disable_plaintext_auth = no first_valid_uid = 400 mail_location = mbox:~/mail:INBOX=/var/mail/%u:INDEX=/mail/indexes/%u:SUBSCRIPTI ONS=../.mailboxlist mbox_very_dirty_syncs = yes passdb { args = max_requests=1 cache_key=%u dovecot driver = pam } passdb { args = /etc/dovecot-ldap.conf driver = ldap } plugin { autosubscribe = INBOX quota = fs:User quota quota_warning = storage=95%% quota-warning %u } quota_full_tempfail = yes service anvil { client_limit = 1639 } service auth { client_limit = 1736 user = root } service imap-login { chroot = login process_limit = 1024 } service imap { process_limit = 680 } service lmtp { process_min_avail = 5 unix_listener lmtp { group = smmsp mode = 0660 } } service pop3-login { chroot = login process_limit = 512 } service pop3 { process_limit = 680 } service quota-warning { executable = script /usr/local/sbin/morts unix_listener quota-warning { mode = 0666 } user = root } ssl_ca = </etc/openssl/certs/caespci2006.crt ssl_cert = </etc/openssl/certs/volanges2012tcs-bundle.crt ssl_key = </etc/openssl/private/volanges2012.key userdb { driver = passwd } userdb { args = /etc/dovecot-ldap.conf driver = ldap } protocol imap { imap_client_workarounds = delay-newmail tb-extra-mailbox-sep mail_max_userip_connections = 8 mail_plugin_dir = /usr/pkg/lib/dovecot mail_plugins = quota imap_quota } protocol pop3 { mail_max_userip_connections = 2 mail_plugin_dir = /usr/pkg/lib/dovecot mail_plugins = quota mbox_dirty_syncs = yes pop3_no_flag_updates = no pop3_uidl_format = %08Xu%08Xv } protocol lmtp { mail_plugins = quota postmaster_address = postmaster@example.net }

-- Emmanuel Dreyfus manu@netbsd.org

Timo Sirainen

9:36 p.m.

On 15.5.2013, at 20.33, Emmanuel Dreyfus <manu@netbsd.org> wrote:

...

On Wed, May 15, 2013 at 02:50:55PM +0300, Timo Sirainen wrote:

...
There are some locking code changes between v2.1 and v2.2, which I guess might be buggy. But I can't reproduce any corruption with stress testing. What's your doveconf -n output? Are you delivering mails via dovecot-lda or something external?

dovecot -n is below. dovecot takes care of delivery, through LMTP.

Additionnal thoughts on possible problems:

one of the users was using mutt locally and accessed its mailbox directly without going through dovecot.

That shouldn't cause problems if locking was configured the same.

...

I experimented dsync replication from another machine that was not accessible through POP/IMAP/SMTP, perhaps this is what caused chaos?

That might cause trouble. I tested today and dsync was doing some strange things with mbox.

Emmanuel Dreyfus

16 May 16 May

4:01 a.m.

On Wed, May 15, 2013 at 09:36:54PM +0300, Timo Sirainen wrote:

...

...

one of the users was using mutt locally and accessed its mailbox directly without going through dovecot. That shouldn't cause problems if locking was configured the same.

I never looked at it, but I assume they both use flock or fcntl since this is local storage. And it worked fine for a while, therefore there is no hint it could be wrong.

...

...

I experimented dsync replication from another machine that was not accessible through POP/IMAP/SMTP, perhaps this is what caused chaos? That might cause trouble. I tested today and dsync was doing some strange things with mbox.

What is the advised setup? Here is the additionnal config I tried on the inacessible host:

mail_plugins = $mail_plugins notify replication service replicator { process_min_avail = 1 } dsync_remote_cmd = ssh -lroot %{host} doveadm dsync-server -u%u plugin { mail_replica = remote:root@server1.example.net } service aggregator { fifo_listener replication-notify-fifo { user = dovecot } unix_listener replication-notify { user = dovecot } } service replicator { unix_listener replicator-doveadm { mode = 0600 } } service replicator { unix_listener replicator-doveadm { mode = 0600 } } service doveadm { inet_listener { port = 12345 ssl = yes } } doveadm_port = 12345 ssl_client_ca_file = /etc/openssl/certs/tcs-chain.crt doveadm_proxy_port = 0

-- Emmanuel Dreyfus manu@netbsd.org

Charles Marcus

1:37 p.m.

On 2013-05-15 9:01 PM, Emmanuel Dreyfus <manu@netbsd.org> wrote:

...

On Wed, May 15, 2013 at 09:36:54PM +0300, Timo Sirainen wrote:

...
...

one of the users was using mutt locally and accessed its mailbox directly without going through dovecot. That shouldn't cause problems if locking was configured the same.

...

I never looked at it, but I assume they both use flock or fcntl

Can't help with your actual problem, but...

What was it that 'assumption' is supposedly the mother of?

;)

Best regards,

Charles

Emmanuel Dreyfus

9:25 p.m.

On Thu, May 16, 2013 at 06:37:45AM -0400, Charles Marcus wrote:

...

...
I never looked at it, but I assume they both use flock or fcntl

Can't help with your actual problem, but... What was it that 'assumption' is supposedly the mother of?

I don't buy that explanation: everything worked fine for years.

-- Emmanuel Dreyfus manu@netbsd.org

Charles Marcus

10:05 p.m.

On 2013-05-16 2:25 PM, Emmanuel Dreyfus <manu@netbsd.org> wrote:

...

On Thu, May 16, 2013 at 06:37:45AM -0400, Charles Marcus wrote:

...
...
I never looked at it, but I assume they both use flock or fcntl Can't help with your actual problem, but... What was it that 'assumption' is supposedly the mother of? I don't buy that explanation: everything worked fine for years.

You miss the point entirely.

Best regards,

Charles

Timo Sirainen

1:56 p.m.

On 16.5.2013, at 4.01, Emmanuel Dreyfus <manu@netbsd.org> wrote:

...

...
...

I experimented dsync replication from another machine that was not accessible through POP/IMAP/SMTP, perhaps this is what caused chaos? That might cause trouble. I tested today and dsync was doing some strange things with mbox.

What is the advised setup?

Not using mbox, at least with dsync, at least for now.

4460

Age (days ago)

4471

Last active (days ago)

List overview

12 comments

6 participants

participants (6)

Charles Marcus
Emmanuel Dreyfus
manu＠netbsd.org
Morten Stevens
Oscar del Rio
Timo Sirainen