[Dovecot] Expunged message reappeared, giving a new UID
I am running dovecot 2.2.2 with tcp based replication, and experiencing
some duplicated emails. doveconf -n
output is below.
I have narrowed it down to the following scenario:
An email arrives, and is successfully replicated to both nodes. It is in INBOX/new/ at this point on both servers.
Connect with a mail client, and delete the message - without delayed expunge. So, for example, mutt (press 'd' then '$' to sync the mailbox), or Evolution set to immediately delete.
The following log entry appears on the server the client is connected to:
Jul 26 10:40:55 intmail3a dovecot: imap(srf): Warning: /mail/spool/s/srf/mail/INBOX/dovecot-uidlist: Duplicate file entry at line 5: 1374831638.M830754P11591.intmail3a,S=1289,W=1321 (uid 733481 -> 733482) - retrying by re-reading from beginning Jul 26 10:40:55 intmail3a dovecot: imap(srf): Warning: Maildir /mail/spool/s/srf/mail/INBOX: Expunged message reappeared, giving a new UID (old uid=733481, file=1374831638.M830754P11591.intmail3a,S=1289,W=1321)
The email reappears in the mailbox, although this time it is in INBOX/new on one node and INBOX/cur on the other. As best I can tell it's always in 'new' on the server the client was connected to and in 'cur' on the other.
If any other mailbox operation occurs before the delete (Marking the message in question as 'read', receiving a new message into the mailbox in a different lmtp session, marking as read or deleting & expunging some other message) then this prevents the duplication from happening.
What can I do to prevent this happening? Is there a synchronisation interval I can adjust, perhaps? Although the message does get synchronised initially.
Thanks in advance, Simon.
# 2.2.2: /mail/etc/dovecot/dovecot.conf # OS: Linux 3.5.0-23-generic x86_64 Ubuntu 12.04.2 LTS disable_plaintext_auth = no doveadm_password = secret first_valid_uid = 100 lda_mailbox_autocreate = yes mail_gid = dovecot mail_home = /mail/spool/%1n/%n mail_location = maildir:~/mail:INBOX=~/mail/INBOX mail_plugins = notify replication mail_uid = dovecot managesieve_notify_capability = mailto managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date ihave namespace inbox { inbox = yes location = mailbox Drafts { special_use = \Drafts } mailbox Junk { special_use = \Junk } mailbox Sent { special_use = \Sent } mailbox "Sent Messages" { special_use = \Sent } mailbox Trash { special_use = \Trash } prefix = separator = . } passdb { args = /mail/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } plugin { mail_replica = tcps:intmail3b:12345 sieve_dir = ~/sieve } protocols = imap lmtp sieve service aggregator { fifo_listener replication-notify-fifo { mode = 0600 user = dovecot } unix_listener replication-notify { mode = 0600 user = dovecot } } service config { unix_listener config { user = dovecot } } service doveadm { inet_listener { port = 12345 ssl = yes } user = dovecot } service replicator { process_min_avail = 1 } ssl_cert = </mail/etc/certs/mail-cert.pem ssl_client_ca_dir = /etc/ssl/certs ssl_key = </mail/etc/certs/mail-key.pem userdb { args = /mail/etc/dovecot/dovecot-ldap.userdb.conf.ext driver = ldap } protocol lmtp { mail_plugins = notify replication sieve } protocol lda { mail_plugins = notify replication sieve } protocol imap { mail_max_userip_connections = 30 }
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
I've just converted the mailboxes to mdbox, so mail_location now looks like this:
mail_location = mdbox:~/mail
The log entry about messages reappearing no longer happens, but the symptoms are the same - if I delete a message, it instantly reappears.
Simon.
On Fri, 2013-07-26 at 11:28 +0100, Simon Fraser wrote:
I am running dovecot 2.2.2 with tcp based replication, and experiencing some duplicated emails.
doveconf -n
output is below.I have narrowed it down to the following scenario:
An email arrives, and is successfully replicated to both nodes. It is in INBOX/new/ at this point on both servers.
Connect with a mail client, and delete the message - without delayed expunge. So, for example, mutt (press 'd' then '$' to sync the mailbox), or Evolution set to immediately delete.
The following log entry appears on the server the client is connected to:
Jul 26 10:40:55 intmail3a dovecot: imap(srf): Warning: /mail/spool/s/srf/mail/INBOX/dovecot-uidlist: Duplicate file entry at line 5: 1374831638.M830754P11591.intmail3a,S=1289,W=1321 (uid 733481 -> 733482) - retrying by re-reading from beginning Jul 26 10:40:55 intmail3a dovecot: imap(srf): Warning: Maildir /mail/spool/s/srf/mail/INBOX: Expunged message reappeared, giving a new UID (old uid=733481, file=1374831638.M830754P11591.intmail3a,S=1289,W=1321)
The email reappears in the mailbox, although this time it is in INBOX/new on one node and INBOX/cur on the other. As best I can tell it's always in 'new' on the server the client was connected to and in 'cur' on the other.
If any other mailbox operation occurs before the delete (Marking the message in question as 'read', receiving a new message into the mailbox in a different lmtp session, marking as read or deleting & expunging some other message) then this prevents the duplication from happening.
What can I do to prevent this happening? Is there a synchronisation interval I can adjust, perhaps? Although the message does get synchronised initially.
Thanks in advance, Simon.
# 2.2.2: /mail/etc/dovecot/dovecot.conf # OS: Linux 3.5.0-23-generic x86_64 Ubuntu 12.04.2 LTS disable_plaintext_auth = no doveadm_password = secret first_valid_uid = 100 lda_mailbox_autocreate = yes mail_gid = dovecot mail_home = /mail/spool/%1n/%n mail_location = maildir:~/mail:INBOX=~/mail/INBOX mail_plugins = notify replication mail_uid = dovecot managesieve_notify_capability = mailto managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date ihave namespace inbox { inbox = yes location = mailbox Drafts { special_use = \Drafts } mailbox Junk { special_use = \Junk } mailbox Sent { special_use = \Sent } mailbox "Sent Messages" { special_use = \Sent } mailbox Trash { special_use = \Trash } prefix = separator = . } passdb { args = /mail/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } plugin { mail_replica = tcps:intmail3b:12345 sieve_dir = ~/sieve } protocols = imap lmtp sieve service aggregator { fifo_listener replication-notify-fifo { mode = 0600 user = dovecot } unix_listener replication-notify { mode = 0600 user = dovecot } } service config { unix_listener config { user = dovecot } } service doveadm { inet_listener { port = 12345 ssl = yes } user = dovecot } service replicator { process_min_avail = 1 } ssl_cert = </mail/etc/certs/mail-cert.pem ssl_client_ca_dir = /etc/ssl/certs ssl_key = </mail/etc/certs/mail-key.pem userdb { args = /mail/etc/dovecot/dovecot-ldap.userdb.conf.ext driver = ldap } protocol lmtp { mail_plugins = notify replication sieve } protocol lda { mail_plugins = notify replication sieve } protocol imap { mail_max_userip_connections = 30 }
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Am 26.07.2013 17:41, schrieb Simon Fraser:
I've just converted the mailboxes to mdbox, so mail_location now looks like this:
mail_location = mdbox:~/mail
The log entry about messages reappearing no longer happens, but the symptoms are the same - if I delete a message, it instantly reappears
look in the thread "Duplicate Mails downloding issue"
since you said "I am running dovecot 2.2.2 with tcp based replication" it sounds like the same problem -> hint: Dovecot Director
On Fri, 2013-07-26 at 17:47 +0200, Reindl Harald wrote:
Am 26.07.2013 17:41, schrieb Simon Fraser:
I've just converted the mailboxes to mdbox, so mail_location now looks like this:
mail_location = mdbox:~/mail
The log entry about messages reappearing no longer happens, but the symptoms are the same - if I delete a message, it instantly reappears
look in the thread "Duplicate Mails downloding issue"
since you said "I am running dovecot 2.2.2 with tcp based replication" it sounds like the same problem -> hint: Dovecot Director
I've read that thread. I'm not seeing any lock file errors, I'm not running over NFS but rather using dovecot's own replication for the mail store, and I'm not using pop3. I'm also not seeing the mails _downloaded_ again but rather reappear on the server after they have been removed: I can watch the message disappear from one server and then be replaced.
I've looked at the documentation for Director and while it looks nice, I don't see how it relates to my problem, as it seems to be for shared-storage deployments. I have tested this with a mail client that's connecting to only one node - this is a test service, and so that one imap connection is the only client.
Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Hi Simon,
Version 2.2.2 is not current any more. I would try to update to the latest stable version 2.2.4 first, since some dsync bugs have been fixed between 2.2.2 and 2.2.4:
http://www.dovecot.org/doc/NEWS-2.2
Regards Daniel
On Fri, 2013-07-26 at 20:26 +0200, Daniel Parthey wrote:
Hi Simon,
Version 2.2.2 is not current any more. I would try to update to the latest stable version 2.2.4 first, since some dsync bugs have been fixed between 2.2.2 and 2.2.4:
I've now upgraded to 2.2.4 (and pigeonhole 0.4.1 from 0.4.0 at the same time). I'm still experiencing the same problem, with both Maildir and mdbox format mailboxes.
My test imap connection is to only one of the servers at a time - I can reproduce this using 'telnet server imap' and issuing "store +flags (\Deleted)" and expunge as well as deleting messages with a mail client.
Thanks, Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Hi, is there anything I can do to help debug this - any tools I should run to discover where the cause might be?
Simon.
On Mon, 2013-07-29 at 10:05 +0100, Simon Fraser wrote:
On Fri, 2013-07-26 at 20:26 +0200, Daniel Parthey wrote:
Hi Simon,
Version 2.2.2 is not current any more. I would try to update to the latest stable version 2.2.4 first, since some dsync bugs have been fixed between 2.2.2 and 2.2.4:
I've now upgraded to 2.2.4 (and pigeonhole 0.4.1 from 0.4.0 at the same time). I'm still experiencing the same problem, with both Maildir and mdbox format mailboxes.
My test imap connection is to only one of the servers at a time - I can reproduce this using 'telnet server imap' and issuing "store +flags (\Deleted)" and expunge as well as deleting messages with a mail client.
Thanks, Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
On Fri, 2013-07-26 at 11:28 +0100, Simon Fraser wrote:
I am running dovecot 2.2.2 with tcp based replication, and experiencing some duplicated emails.
doveconf -n
output is below.
Are both of the servers using the same mailbox format?
I have narrowed it down to the following scenario:
An email arrives, and is successfully replicated to both nodes. It is in INBOX/new/ at this point on both servers.
Connect with a mail client, and delete the message - without delayed expunge. So, for example, mutt (press 'd' then '$' to sync the mailbox), or Evolution set to immediately delete.
Can you reproduce this by disabling automatic replication (e.g. just remove replicator from mail_plugins) and running dsync manually? So basically delete the message, then run:
doveadm sync -r rawlog -u user@domain -d
If that reproduces it, send the rawlog to me.
The following log entry appears on the server the client is connected to:
Jul 26 10:40:55 intmail3a dovecot: imap(srf): Warning: /mail/spool/s/srf/mail/INBOX/dovecot-uidlist: Duplicate file entry at line 5: 1374831638.M830754P11591.intmail3a,S=1289,W=1321 (uid 733481 -> 733482) - retrying by re-reading from beginning Jul 26 10:40:55 intmail3a dovecot: imap(srf): Warning: Maildir /mail/spool/s/srf/mail/INBOX: Expunged message reappeared, giving a new UID (old uid=733481, file=1374831638.M830754P11591.intmail3a,S=1289,W=1321)
This seems to be a bug of its own regardless of what happens with replication.
On Thu, 2013-08-01 at 15:09 +0300, Timo Sirainen wrote:
On Fri, 2013-07-26 at 11:28 +0100, Simon Fraser wrote:
I am running dovecot 2.2.2 with tcp based replication, and experiencing some duplicated emails.
doveconf -n
output is below.Are both of the servers using the same mailbox format?
Yes, they are. doveconf -n | grep mail_location
on both nodes gives:
mail_location = maildir:~/mail:INBOX=~/mail/INBOX
I've also tried with mdbox.
Connect with a mail client, and delete the message - without delayed expunge. So, for example, mutt (press 'd' then '$' to sync the mailbox), or Evolution set to immediately delete.
Can you reproduce this by disabling automatic replication (e.g. just remove replicator from mail_plugins) and running dsync manually? So basically delete the message, then run:
doveadm sync -r rawlog -u user@domain -d
If that reproduces it, send the rawlog to me.
I can't reproduce it with that, sorry. Message deletion appears to work that way, and the command only produces I/O leak errors:
dsync-local(srf@sanger.ac.uk): Warning: I/O leak: 0x7f00e50cc960 (line 341, fd 9)
Using mail delivery and imap connections to node 'a' only, I have tried:
- mail delivered, connecting/deleting/expunging, then sync
- mail delivered, syncing, connecting/deleting/expunging, not syncing, then checking
- mail delivered, syncing, connecting/deleting/expunging, syncing
None cause the message to reappear. I tried initiating the sync from both nodes, just in case.
Leaving the replication plugin enabled causes it to happen, though, so perhaps my timing is off.
Other things I have discovered by running 'watch' or a while loop, both grepping the mail_location for my test subject line:
A message is visible over imap before being written to mail_location. If I can manage to delete it in this time (there isn't much time!) then the message does not reappear.
When the message reappears it gets a different filename (please excuse the grep output on the end): cur/1375364237.M969208P14576.intmail3a,S=2672,W=2731:2,:Subject: dup test
is replaced with: cur/1375364242.M782761P838.intmail3b,S=2672,W=2731:2,:Subject: dup test
This occurs less than a second after the expunge. There's only a log entry on intmail3a, not 3b:
Aug 1 14:37:26 intmail3a dovecot: imap(srf): Warning: /mail/spool/s/srf/mail/INBOX/dovecot-uidlist: Duplicate file entry at line 105: 1375364237.M969208P14576.intmail3a,S=2672,W=2731 (uid 772051 -> 772052)
I'm happy to run as many diagnostics as required, just let me know what you'd like.
Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Hello,
For what it's worth, still experiencing these symptoms with 2.2.5.
Thanks, Simon.
On Thu, 2013-08-01 at 15:02 +0100, Simon Fraser wrote:
On Thu, 2013-08-01 at 15:09 +0300, Timo Sirainen wrote:
On Fri, 2013-07-26 at 11:28 +0100, Simon Fraser wrote:
I am running dovecot 2.2.2 with tcp based replication, and experiencing some duplicated emails.
doveconf -n
output is below.Are both of the servers using the same mailbox format?
Yes, they are.
doveconf -n | grep mail_location
on both nodes gives: mail_location = maildir:~/mail:INBOX=~/mail/INBOXI've also tried with mdbox.
Connect with a mail client, and delete the message - without delayed expunge. So, for example, mutt (press 'd' then '$' to sync the mailbox), or Evolution set to immediately delete.
Can you reproduce this by disabling automatic replication (e.g. just remove replicator from mail_plugins) and running dsync manually? So basically delete the message, then run:
doveadm sync -r rawlog -u user@domain -d
If that reproduces it, send the rawlog to me.
I can't reproduce it with that, sorry. Message deletion appears to work that way, and the command only produces I/O leak errors:
dsync-local(srf@sanger.ac.uk): Warning: I/O leak: 0x7f00e50cc960 (line 341, fd 9)
Using mail delivery and imap connections to node 'a' only, I have tried:
- mail delivered, connecting/deleting/expunging, then sync
- mail delivered, syncing, connecting/deleting/expunging, not syncing, then checking
- mail delivered, syncing, connecting/deleting/expunging, syncing
None cause the message to reappear. I tried initiating the sync from both nodes, just in case.
Leaving the replication plugin enabled causes it to happen, though, so perhaps my timing is off.
Other things I have discovered by running 'watch' or a while loop, both grepping the mail_location for my test subject line:
A message is visible over imap before being written to mail_location. If I can manage to delete it in this time (there isn't much time!) then the message does not reappear.
When the message reappears it gets a different filename (please excuse the grep output on the end): cur/1375364237.M969208P14576.intmail3a,S=2672,W=2731:2,:Subject: dup test
is replaced with: cur/1375364242.M782761P838.intmail3b,S=2672,W=2731:2,:Subject: dup test
This occurs less than a second after the expunge. There's only a log entry on intmail3a, not 3b:
Aug 1 14:37:26 intmail3a dovecot: imap(srf): Warning: /mail/spool/s/srf/mail/INBOX/dovecot-uidlist: Duplicate file entry at line 105: 1375364237.M969208P14576.intmail3a,S=2672,W=2731 (uid 772051 -> 772052)
I'm happy to run as many diagnostics as required, just let me know what you'd like.
Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
On 1.8.2013, at 17.02, Simon Fraser <srf@sanger.ac.uk> wrote:
Connect with a mail client, and delete the message - without delayed expunge. So, for example, mutt (press 'd' then '$' to sync the mailbox), or Evolution set to immediately delete.
Can you reproduce this by disabling automatic replication (e.g. just remove replicator from mail_plugins) and running dsync manually? So basically delete the message, then run:
doveadm sync -r rawlog -u user@domain -d
If that reproduces it, send the rawlog to me.
I can't reproduce it with that, sorry. Message deletion appears to work that way, and the command only produces I/O leak errors:
Here's another idea:
Try disabling replicator plugin from only one side, so there's not possibility of two dsyncs running at the same time. That should be prevented already by locking though.
The servers have different hostnames, right?
On 6.8.2013, at 14.30, Timo Sirainen <tss@iki.fi> wrote:
Here's another idea:
Try disabling replicator plugin from only one side, so there's not possibility of two dsyncs running at the same time. That should be prevented already by locking though.
The servers have different hostnames, right?
The more I think about it, the more this makes sense. You seem to have different hostnames, but .. maybe they're not from Dovecot's point of view for some reason? I added a new dovecot --hostdomain parameter to check it: http://hg.dovecot.org/dovecot-2.2/rev/5a3821097f3c
On Tue, 2013-08-06 at 14:30 +0300, Timo Sirainen wrote:
Here's another idea:
Thank you for still looking into this
Try disabling replicator plugin from only one side, so there's not possibility of two dsyncs running at the same time. That should be prevented already by locking though.
I disabled the replication on node b, restarted both, and connected to node a to deliver and read mail, and had the same symptoms. Tried it with replication enabled on node b but disabled on node a, and naturally the message didn't get replicated at all, and so didn't reappear.
The servers have different hostnames, right?
They do. There was a record that pointed to both IP addresses, but I've removed it after reading your suggestion here, and still see the symptoms. I also have a test system which has never had that A record that can show the same symptoms.
The more I think about it, the more this makes sense. You seem to have different hostnames, but .. maybe they're not from Dovecot's point of view for some reason? I added a new dovecot --hostdomain parameter to check it: http://hg.dovecot.org/dovecot-2.2/rev/5a3821097f3c
root@intmail3a:~# /mail/sbin/dovecot --hostdomain intmail3a.internal.sanger.ac.uk root@intmail3b:~# /mail/sbin/dovecot --hostdomain intmail3b.internal.sanger.ac.uk
Each hostname points to 1 IP address, and the only PTR for each IP address is the hostname. No entry in /etc/hosts for either server name.
Inspired by this, I have also tried disabling ipv6 on both servers, in case the lack of DNS entries there was causing an issue, but it didn't fix it.
Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
On 6.8.2013, at 15.59, Simon Fraser <srf@sanger.ac.uk> wrote:
Try disabling replicator plugin from only one side, so there's not possibility of two dsyncs running at the same time. That should be prevented already by locking though.
I disabled the replication on node b, restarted both, and connected to node a to deliver and read mail, and had the same symptoms.
Weird, I was sure that would have worked. Well, maybe rawlogs would show something interesting. I should probably add a proper option for them, but attached a patch to enable for now. Be sure to mkdir /tmp/dsync-rawlogs with enough write permissions (e.g. 0777)
On 6.8.2013, at 16.42, Timo Sirainen <tss@iki.fi> wrote:
On 6.8.2013, at 15.59, Simon Fraser <srf@sanger.ac.uk> wrote:
Try disabling replicator plugin from only one side, so there's not possibility of two dsyncs running at the same time. That should be prevented already by locking though.
I disabled the replication on node b, restarted both, and connected to node a to deliver and read mail, and had the same symptoms.
Weird, I was sure that would have worked. Well, maybe rawlogs would show something interesting. I should probably add a proper option for them, but attached a patch to enable for now. Be sure to mkdir /tmp/dsync-rawlogs with enough write permissions (e.g. 0777)
Oh and do it on both the servers and send me all the logs from both servers. Hopefully it contains only the one deletion test case and no extra stuff. :)
Also, instead of using IMAP client to delete the mail, try something like:
doveadm expunge -u user@domain mailbox INBOX subject testmail
Because I was thinking that with IMAP it first sets the \Deleted flag, which triggers replication. Then it immediately does EXPUNGE which deletes the mail locally, but I was thinking that now remote dsync also triggered because of the flag change and notices the the mail is gone and puts it back. But locking should have avoided that. Also disabling replication on the remote side should have avoided that.
On Tue, 2013-08-06 at 16:45 +0300, Timo Sirainen wrote:
Weird, I was sure that would have worked. Well, maybe rawlogs would
show something interesting. I should probably add a proper option for them, but attached a patch to enable for now. Be sure to mkdir /tmp/dsync-rawlogs with enough write permissions (e.g. 0777)
I've added that patch and I'm getting: Aug 16:18:48 intmail3a dovecot: doveadm: Error: Don't give mail location with -d parameter Aug 6 16:18:48 intmail3a dovecot: doveadm(srf): Error: Saved sync state is invalid, falling back to full sync: Input too small
Presumably it's thinking the "-r /tmp/dsync-rawlog" is a mail location? I've tried changing its location in the appends, but it doesn't make a difference.
Oh and do it on both the servers and send me all the logs from both servers. Hopefully it contains only the one deletion test case and no extra stuff. :)
Also, instead of using IMAP client to delete the mail, try something like:
doveadm expunge -u user@domain mailbox INBOX subject testmail
That successfully expunges a message, without it reappearing.
Because I was thinking that with IMAP it first sets the \Deleted flag, which triggers replication. Then it immediately does EXPUNGE which deletes the mail locally, but I was thinking that now remote dsync also triggered because of the flag change and notices the the mail is gone and puts it back. But locking should have avoided that. Also disabling replication on the remote side should have avoided that.
Seems plausible. I tried again disabling replication on the second node to make sure I hadn't made an error - I removed 'replication' from mail_plugins and the plugins section of the lmtp and lda services, and removed the 'service replicator' block, but I'm afraid it's still there.
I'm currently digging around with replication over ssh, to see if it still happens that way.
Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
On 6.8.2013, at 18.58, Simon Fraser <srf@sanger.ac.uk> wrote:
On Tue, 2013-08-06 at 16:45 +0300, Timo Sirainen wrote:
Weird, I was sure that would have worked. Well, maybe rawlogs would
show something interesting. I should probably add a proper option for them, but attached a patch to enable for now. Be sure to mkdir /tmp/dsync-rawlogs with enough write permissions (e.g. 0777)
I've added that patch and I'm getting: Aug 16:18:48 intmail3a dovecot: doveadm: Error: Don't give mail location with -d parameter Aug 6 16:18:48 intmail3a dovecot: doveadm(srf): Error: Saved sync state is invalid, falling back to full sync: Input too small
Presumably it's thinking the "-r /tmp/dsync-rawlog" is a mail location? I've tried changing its location in the appends, but it doesn't make a difference.
Oops, I messed up the parameter order. It was supposed to have -s state but now it had -s -r rawlog state. New patch should work better.
On Tue, 2013-08-06 at 19:15 +0300, Timo Sirainen wrote:
Presumably it's thinking the "-r /tmp/dsync-rawlog" is a mail location? I've tried changing its location in the appends, but it doesn't make a difference.
Oops, I messed up the parameter order. It was supposed to have -s state but now it had -s -r rawlog state. New patch should work better.
I ran two tests: one using 'doveadm expunge' and one deleting the message using mutt. Since the hosts mentioned so far have a copy of my full mailbox on, I re-ran the tests (with the same results) on a test server with a fresh mailbox on, so there was no extra folder synchronisation in there to fill up the rawlog.
Those log entries are too big for the mailing list (70k+), so are here:
'doveadm expunge' dsync-rawlog node A http://pastebin.com/LtUnENPv
'doveadm expunge' dsync-rawlog node B http://pastebin.com/QaWLyZq2
imap expunge dsync-rawlog node A http://pastebin.com/SuFdWn0w
imap expunge dsync-rawlog node B http://pastebin.com/Ex66s7hq
Mail logs on both contain entries like this: Aug 6 18:04:37 dcot2a dovecot: master: Dovecot v2.2.5 starting up (core dumps disabled) Aug 6 18:04:38 dcot2a dovecot: doveadm: Error: Don't give mail location with -d parameter
Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Are there any more tests I can run for this?
Thanks, Simon.
On Thu, 2013-08-08 at 09:21 +0100, Simon Fraser wrote:
On Tue, 2013-08-06 at 19:15 +0300, Timo Sirainen wrote:
Presumably it's thinking the "-r /tmp/dsync-rawlog" is a mail location? I've tried changing its location in the appends, but it doesn't make a difference.
Oops, I messed up the parameter order. It was supposed to have -s state but now it had -s -r rawlog state. New patch should work better.
I ran two tests: one using 'doveadm expunge' and one deleting the message using mutt. Since the hosts mentioned so far have a copy of my full mailbox on, I re-ran the tests (with the same results) on a test server with a fresh mailbox on, so there was no extra folder synchronisation in there to fill up the rawlog.
Those log entries are too big for the mailing list (70k+), so are here:
'doveadm expunge' dsync-rawlog node A http://pastebin.com/LtUnENPv
'doveadm expunge' dsync-rawlog node B http://pastebin.com/QaWLyZq2
imap expunge dsync-rawlog node A http://pastebin.com/SuFdWn0w
imap expunge dsync-rawlog node B http://pastebin.com/Ex66s7hq
Mail logs on both contain entries like this: Aug 6 18:04:37 dcot2a dovecot: master: Dovecot v2.2.5 starting up (core dumps disabled) Aug 6 18:04:38 dcot2a dovecot: doveadm: Error: Don't give mail location with -d parameter
Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
On Mon, 2013-08-19 at 12:08 +0100, Simon Fraser wrote:
Are there any more tests I can run for this?
I've tried an ssh-based replication configuration, and still experience the same symptoms. Going through a director proxy also doesn't help (as expected, that one, since I was already only connecting to one of the servers)
Simon.
On Thu, 2013-08-08 at 09:21 +0100, Simon Fraser wrote:
On Tue, 2013-08-06 at 19:15 +0300, Timo Sirainen wrote:
Presumably it's thinking the "-r /tmp/dsync-rawlog" is a mail location? I've tried changing its location in the appends, but it doesn't make a difference.
Oops, I messed up the parameter order. It was supposed to have -s state but now it had -s -r rawlog state. New patch should work better.
I ran two tests: one using 'doveadm expunge' and one deleting the message using mutt. Since the hosts mentioned so far have a copy of my full mailbox on, I re-ran the tests (with the same results) on a test server with a fresh mailbox on, so there was no extra folder synchronisation in there to fill up the rawlog.
Those log entries are too big for the mailing list (70k+), so are here:
'doveadm expunge' dsync-rawlog node A http://pastebin.com/LtUnENPv
'doveadm expunge' dsync-rawlog node B http://pastebin.com/QaWLyZq2
imap expunge dsync-rawlog node A http://pastebin.com/SuFdWn0w
imap expunge dsync-rawlog node B http://pastebin.com/Ex66s7hq
Mail logs on both contain entries like this: Aug 6 18:04:37 dcot2a dovecot: master: Dovecot v2.2.5 starting up (core dumps disabled) Aug 6 18:04:38 dcot2a dovecot: doveadm: Error: Don't give mail location with -d parameter
Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Time marches on, and I need to continue the service migration. I'd still like to use Dovecot (we're migrating away from Cyrus). I'm assuming the only other alternative without existing shared storage is to use DRBD and a cluster file system to provide the replication, and to ensure Director is enabled. Are there any things to watch for surrounding this?
Simon.
On Fri, 2013-08-30 at 09:49 +0100, Simon Fraser wrote:
On Mon, 2013-08-19 at 12:08 +0100, Simon Fraser wrote:
Are there any more tests I can run for this?
I've tried an ssh-based replication configuration, and still experience the same symptoms. Going through a director proxy also doesn't help (as expected, that one, since I was already only connecting to one of the servers)
Simon.
On Thu, 2013-08-08 at 09:21 +0100, Simon Fraser wrote:
On Tue, 2013-08-06 at 19:15 +0300, Timo Sirainen wrote:
Presumably it's thinking the "-r /tmp/dsync-rawlog" is a mail location? I've tried changing its location in the appends, but it doesn't make a difference.
Oops, I messed up the parameter order. It was supposed to have -s state but now it had -s -r rawlog state. New patch should work better.
I ran two tests: one using 'doveadm expunge' and one deleting the message using mutt. Since the hosts mentioned so far have a copy of my full mailbox on, I re-ran the tests (with the same results) on a test server with a fresh mailbox on, so there was no extra folder synchronisation in there to fill up the rawlog.
Those log entries are too big for the mailing list (70k+), so are here:
'doveadm expunge' dsync-rawlog node A http://pastebin.com/LtUnENPv
'doveadm expunge' dsync-rawlog node B http://pastebin.com/QaWLyZq2
imap expunge dsync-rawlog node A http://pastebin.com/SuFdWn0w
imap expunge dsync-rawlog node B http://pastebin.com/Ex66s7hq
Mail logs on both contain entries like this: Aug 6 18:04:37 dcot2a dovecot: master: Dovecot v2.2.5 starting up (core dumps disabled) Aug 6 18:04:38 dcot2a dovecot: doveadm: Error: Don't give mail location with -d parameter
Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Am 16.09.2013 15:24, schrieb Simon Fraser:
Time marches on, and I need to continue the service migration. I'd still like to use Dovecot (we're migrating away from Cyrus). I'm assuming the only other alternative without existing shared storage is to use DRBD and a cluster file system to provide the replication, and to ensure Director is enabled. Are there any things to watch for surrounding this?
Simon.
please search the list archives, there are many posts with clustered filesystems or nfs setups, both are possible, depends what might fit best at your place / size, i have ocfs2 and drbd cluster working with maildir on ubuntu lucid ( for better performance in special with bigger mailboxes dbox or mdbox might be better today ) behind loadbalancers, if you brave you might try ceph
perhaps have a look
http://sys4.de/de/blog/2013/06/10/loadbalancing-mit-keepalived-postfix-dovec...
http://sys4.de/de/blog/2013/06/06/postfix-dovecot-ceph-cluster-storage/
sorry only german
i have no good reports in using gfs2 for cluster, but this might got fixed recent
On Fri, 2013-08-30 at 09:49 +0100, Simon Fraser wrote:
On Mon, 2013-08-19 at 12:08 +0100, Simon Fraser wrote:
Are there any more tests I can run for this?
I've tried an ssh-based replication configuration, and still experience the same symptoms. Going through a director proxy also doesn't help (as expected, that one, since I was already only connecting to one of the servers)
Simon.
On Thu, 2013-08-08 at 09:21 +0100, Simon Fraser wrote:
On Tue, 2013-08-06 at 19:15 +0300, Timo Sirainen wrote:
Presumably it's thinking the "-r /tmp/dsync-rawlog" is a mail location? I've tried changing its location in the appends, but it doesn't make a difference.
Oops, I messed up the parameter order. It was supposed to have -s state but now it had -s -r rawlog state. New patch should work better.
I ran two tests: one using 'doveadm expunge' and one deleting the message using mutt. Since the hosts mentioned so far have a copy of my full mailbox on, I re-ran the tests (with the same results) on a test server with a fresh mailbox on, so there was no extra folder synchronisation in there to fill up the rawlog.
Those log entries are too big for the mailing list (70k+), so are here:
'doveadm expunge' dsync-rawlog node A http://pastebin.com/LtUnENPv
'doveadm expunge' dsync-rawlog node B http://pastebin.com/QaWLyZq2
imap expunge dsync-rawlog node A http://pastebin.com/SuFdWn0w
imap expunge dsync-rawlog node B http://pastebin.com/Ex66s7hq
Mail logs on both contain entries like this: Aug 6 18:04:37 dcot2a dovecot: master: Dovecot v2.2.5 starting up (core dumps disabled) Aug 6 18:04:38 dcot2a dovecot: doveadm: Error: Don't give mail location with -d parameter
Simon.
Best Regards MfG Robert Schetterer
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
On 9/16/2013 8:24 AM, Simon Fraser wrote:
Time marches on, and I need to continue the service migration. I'd still like to use Dovecot (we're migrating away from Cyrus). I'm assuming the only other alternative without existing shared storage is to use DRBD and a cluster file system to provide the replication, and to ensure Director is enabled. Are there any things to watch for surrounding this?
Just the basics:
- DRBD is limited to two nodes
- Twice as many disks are required vs single host, NFS share, SAN LUN
- GFS2/OCFS2 metadata exchange is slow and gets slower with load
- maildir doesn't scale well due to metadata modification rate
DRBD may or may not be your best solution.
I didn't read the previous incarnation of this thread as the subject lacked keywords in my area of interest/expertise. This subject one has both cluster and filesystem. If you'd like to further discuss storage options for your Dovecot workload I'd be glad to. Knowing your current/future mailbox count, avg mail and mailbox size, msgs/day, etc, is required for proper analysis.
-- Stan
Does anybody know about GlusterFS & Dovecot?
...Andreas
Time marches on, and I need to continue the service migration. I'd still like to use Dovecot (we're migrating away from Cyrus). I'm assuming the only other alternative without existing shared storage is to use DRBD and a cluster file system to provide the replication, and to ensure Director is enabled. Are there any things to watch for surrounding this?
Andreas Gaiser - Berlin agaiser@multifake.net
On 9/17/2013 3:23 PM, Andreas Gaiser wrote:
Does anybody know about GlusterFS & Dovecot?
http://lmgtfy.com/?q=dovecot+glusterfs
-- Stan
Am 18.09.2013 03:40, schrieb Stan Hoeppner:
On 9/17/2013 3:23 PM, Andreas Gaiser wrote:
Does anybody know about GlusterFS & Dovecot?
wow cool trick *g
Best Regards MfG Robert Schetterer
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
Am 17.09.2013 22:23, schrieb Andreas Gaiser:
Does anybody know about GlusterFS & Dovecot?
...Andreas
to my latest Infos all fuse based filesystems are not working very well by design, if youre brave go try out ceph, but perhaps wait for more other infos from the list about glusterfs
Time marches on, and I need to continue the service migration. I'd still like to use Dovecot (we're migrating away from Cyrus). I'm assuming the only other alternative without existing shared storage is to use DRBD and a cluster file system to provide the replication, and to ensure Director is enabled. Are there any things to watch for surrounding this?
Best Regards MfG Robert Schetterer
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
Does anybody know about GlusterFS & Dovecot?
...Andreas
Time marches on, and I need to continue the service migration. I'd still like to use Dovecot (we're migrating away from Cyrus). I'm assuming the only other alternative without existing shared storage is to use DRBD and a cluster file system to provide the replication, and to ensure Director is enabled. Are there any things to watch for surrounding this? We tested glusterfs 3.2 a while ago using four storage nodes, four Dovecot/Postfix machines, and a number of email client bots that generated upwards of a 180k inbound messages per hour, and upwards of 360k pop/imap connections per hour. Unfortunately we did not grab any metrics on how long it took for a POP/IMAP session to open/read/delete/close or how long SMTP transactions took, we simply wanted to see how much load would be generated which was reasonable for
On 9/17/13 3:23 PM, Andreas Gaiser wrote: the machines we used. All storage and mail machines were virtual (vmware) and consisted of 2vcpu with 8gigs mem running Centos 6.1. We tested both NFS and the gluster native client and didn't see much difference in perceived load on the system. We did not run into any of the issues that are common with running Dovecot over NFS during our testing, which we attribute to a proper configuration for NFS and solid NTP. We ran an extended test that lasted for about two months and nothing really hiccuped or failed to function so I would call it a success to that extent.
We also tested stretching the glusterfs cluster between our two data centers which are 100 miles apart as the fiber lays. Our latency is very low and stable between sites, and resulted in a small increase in load on the cluster. I would not recommend this concept over anything but the most stable and fault tolerant WAN imaginable, but it seemed to work reasonably well for the duration of the testing we did (about a day long test).
If I were to do it again obviously I would grab metrics and compare it to access times for a basic single server system on local disk and an NFS backed system using multiple servers, but alas we were just propping it up for fun and see how far we could abuse it. If one could assume that Glusterfs does scale linearly with more nodes you could continue to add capacity to the storage layer and grow the cluster, but that's another level of testing all together.
On 8.8.2013, at 11.21, Simon Fraser <srf@sanger.ac.uk> wrote:
I ran two tests: one using 'doveadm expunge' and one deleting the message using mutt. Since the hosts mentioned so far have a copy of my full mailbox on, I re-ran the tests (with the same results) on a test server with a fresh mailbox on, so there was no extra folder synchronisation in there to fill up the rawlog.
Those log entries are too big for the mailing list (70k+), so are here:
'doveadm expunge' dsync-rawlog node A http://pastebin.com/LtUnENPv
'doveadm expunge' dsync-rawlog node B http://pastebin.com/QaWLyZq2
imap expunge dsync-rawlog node A http://pastebin.com/SuFdWn0w
imap expunge dsync-rawlog node B http://pastebin.com/Ex66s7hq
I finally looked into this:
The A and B node logs are exactly the same. I think you sent the same ones for both? Anyway, one of the sides is enough. The interesting parts are:
1375808883.299424 O: NINBOX y 0314c806e3fa0052d26a0000736ac1b0 1375795939 24 1375808883.350378 I: NINBOX y 0314c806e3fa0052d26a0000736ac1b0 1375795939 23 1375808883.360216 I: Ce 22 6e2b7029b52c015258220000736ac1b0 1375808883.360972 O: Cs 23 ae42e400732d0152d3310000736ac1b0 59 1375808883 20
One side has uidnext=23 and the other side has uidnext=24. You're deleting the last message with uid=22, so the uidnext=23 is correct. The other side however thinks that the same mail's uid is 23. There must be something wrong with the mail delivery, because both sides should have uid=22 and uidnext=23 here. So replication rawlogs of a new mail delivery would be helpful..
On 22.9.2013, at 0.02, Timo Sirainen <tss@iki.fi> wrote:
The A and B node logs are exactly the same. I think you sent the same ones for both? Anyway, one of the sides is enough. The interesting parts are:
1375808883.299424 O: NINBOX y 0314c806e3fa0052d26a0000736ac1b0 1375795939 24 1375808883.350378 I: NINBOX y 0314c806e3fa0052d26a0000736ac1b0 1375795939 23 1375808883.360216 I: Ce 22 6e2b7029b52c015258220000736ac1b0 1375808883.360972 O: Cs 23 ae42e400732d0152d3310000736ac1b0 59 1375808883 20
One side has uidnext=23 and the other side has uidnext=24. You're deleting the last message with uid=22, so the uidnext=23 is correct. The other side however thinks that the same mail's uid is 23. There must be something wrong with the mail delivery, because both sides should have uid=22 and uidnext=23 here. So replication rawlogs of a new mail delivery would be helpful..
Or there are some other strange things here also: The GUIDs are different for the mails, so it's as if the same mail was saved to both sides via LMTP instead of being copied to the other side via replication? Also the logs show an extra dsync run that seems to mess things up even further. The whole deletion operation did:
- expunge uid=22
- copy uid=23 from A to B
- expunge uid=21 (the message was there twice?)
- copy uid=23 from B to back to A (??)
On Sun, 2013-09-22 at 00:13 +0300, Timo Sirainen wrote:
One side has uidnext=23 and the other side has uidnext=24. You're deleting the last message with uid=22, so the uidnext=23 is correct. The other side however thinks that the same mail's uid is 23. There must be something wrong with the mail delivery, because both sides should have uid=22 and uidnext=23 here. So replication rawlogs of a new mail delivery would be helpful..
Here's the rawlog from node A: http://pastebin.com/2hcBXNc4 (24k, hence not attached)
This records the following activity:
Shut down dovecot on both nodes removed the mailbox (so that the new message delivery would create it) Emptied /tmp/dsync-rawlog Started dovecot on both nodes Sent a test email to node A Deleted it - it didn't reappear Sent a second test email to node A Deleted the second email, which reappeared.
After testing this a few times with a different number of messages, it seems they only start reappearing after the first deletion.
Or there are some other strange things here also: The GUIDs are different for the mails, so it's as if the same mail was saved to both sides via LMTP instead of being copied to the other side via replication? Also the logs show an extra dsync run that seems to mess things up even further. The whole deletion operation did:
- expunge uid=22
- copy uid=23 from A to B
- expunge uid=21 (the message was there twice?)
- copy uid=23 from B to back to A (??)
I have double checked the configuration and node b has not taken delivery of any messages - exim has nothing in the logs.
Thank you for your perseverance with this!
Simon.
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
participants (8)
-
Andreas Gaiser
-
Daniel Parthey
-
List
-
Reindl Harald
-
Robert Schetterer
-
Simon Fraser
-
Stan Hoeppner
-
Timo Sirainen