Hi there,
Not sure whether it's a bug or env setting issue.
I am running some concurrent testings under NFS.
The dovecot (version 2.0.13) are deployed in 2 servers (same setup), whose maildir are on the file server via NFS. Here are the output of dovecot -n: [root@MyMachine src]$dovecot -n # 2.0.13: /usr/local/etc/dovecot/dovecot.conf # OS: Linux 2.6.18-274.3.1.el5 i686 Red Hat Enterprise Linux Server release 5.7 (Tikanga) nfs auth_anonymous_username = andy auth_debug = yes auth_debug_passwords = yes auth_mechanisms = anonymous plain auth_verbose = yes auth_verbose_passwords = plain disable_plaintext_auth = no lock_method = dotlock log_path = /tmp/log mail_debug = yes mail_fsync = always mail_gid = andy mail_location = maildir:/tmp/NFS mail_nfs_index = yes mail_nfs_storage = yes mail_uid = andy mmap_disable = yes passdb { driver = pam } ssl = no userdb { driver = passwd }
Here are what I am doing: One session running loop of COPY commands (while(1) COPY...) connects to one dovecot server; The other session running loop of SELECT commands (while(1) SELECT...) connects to the other dovecot server. Both are accessing the same mail box (/tmp/NFS);
After some while (not accurate duration, maybe 2 seconds, or 1 min), I found the number of EXISTS returned from SELECT command is not correct(less than the real number). Then I stop the both sessions. In the /tmp/log: Feb 07 03:44:59 imap(andy): Error: Corrupted transaction log file /tmp/NFS/dovecot.index.log seq 2: Unexpected garbage at EOF (sync_offset=2204) Feb 07 03:44:59 imap(andy): Error: Index /tmp/NFS/dovecot.index: Lost log for seq=2 offset=2204 Feb 07 03:44:59 imap(andy): Warning: fscking index file /tmp/NFS/dovecot.index
Then I tried to dump the index, sometime it's failed because of the index corruption. If not, it says: [root@MyMachine src]$doveadm dump /tmp/NFS > /tmp/dump ; vi /tmp/dump doveadm(root): Error: Log synchronization error at seq=2,offset=744 for /tmp/NFS/dovecot.index: Broken extension introduction: Headersize too large (2273345664) doveadm(root): Warning: fscking index file /tmp/NFS/dovecot.index doveadm(root): Error: fcntl(write-lock) locking failed for file /tmp/NFS/dovecot.index.log: Bad file descriptor doveadm(root): Error: mail_index_wait_lock_fd() failed with file /tmp/NFS/dovecot.index.log: Bad file descriptor doveadm(root): Error: Log synchronization error at seq=2,offset=744 for /tmp/NFS/dovecot.index: Broken extension introduction: Headersize too large (2273345664) doveadm(root): Warning: fscking index file /tmp/NFS/dovecot.index doveadm(root): Error: fcntl(write-lock) locking failed for file /tmp/NFS/dovecot.index.log: Bad file descriptor doveadm(root): Error: mail_index_wait_lock_fd() failed with file /tmp/NFS/dovecot.index.log: Bad file descriptor
And sometime from the dump output, in the RECORD part: -- RECORDS: 5 RECORD: seq=1, uid=1, flags=0x00
RECORD: seq=2, uid=2, flags=0x00
RECORD: seq=3, uid=3, flags=0x00
RECORD: seq=4, uid=4, flags=0x00
RECORD: seq=5, uid=6, flags=0x00
The uid 5 is missed. but in uidlist file, it's there..
Here are all what I found. If you need additional information, pls let me know.
The clock on the 3 machines are synchronized.
You can also reproduce it if the 2 sessions are APPEND and SELECT.
If both sessions are running towards the same dovecot server, even the maildir are on the NFS, it works very well without any error.
On 7.2.2012, at 8.26, Andy YB Hu wrote:
I am running some concurrent testings under NFS. .. Here are what I am doing: One session running loop of COPY commands (while(1) COPY...) connects to one dovecot server; The other session running loop of SELECT commands (while(1) SELECT...) connects to the other dovecot server. Both are accessing the same mail box (/tmp/NFS);
I don't even attempt to support this kind of configuration anymore. Use http://wiki2.dovecot.org/Director
Thanks Timo,
I just tried out the Director. One question is about the re-redirection. I know director will redirect all the simultaneous requests from the same user to only a single server at the same time. The question is how to manage the time period after last connection to re-decide to redirect which machine? director_user_expire? Look like not.
I did one test, set director_user_expire = 1 min, then keep sending requests to the director in 2 min interval, the result is it keeps redirect to the same back end server.
Actually what i want is the "secondary load balancer layer" can redirect requests to random back end. How to manage it? Only after the files on the previous back end is expired?
Thanks.
Timo Sirainen
<tss@iki.fi>
Sent by: To
dovecot-bounces@d Dovecot Mailing List
ovecot.org <dovecot@dovecot.org>
cc
02/09/2012 07:49 Subject
AM Re: [Dovecot] Synchronization error
in NFS
Please respond to
Dovecot Mailing
List
<dovecot@dovecot.
org>
On 7.2.2012, at 8.26, Andy YB Hu wrote:
I am running some concurrent testings under NFS. .. Here are what I am doing: One session running loop of COPY commands (while(1) COPY...) connects to one dovecot server; The other session running loop of SELECT commands (while(1) SELECT...) connects to the other dovecot server. Both are accessing the same mail box (/tmp/NFS);
I don't even attempt to support this kind of configuration anymore. Use http://wiki2.dovecot.org/Director
On 9.2.2012, at 10.36, Andy YB Hu wrote:
I just tried out the Director. One question is about the re-redirection. I know director will redirect all the simultaneous requests from the same user to only a single server at the same time. The question is how to manage the time period after last connection to re-decide to redirect which machine? director_user_expire? Look like not.
I did one test, set director_user_expire = 1 min, then keep sending requests to the director in 2 min interval, the result is it keeps redirect to the same back end server.
In normal operation the user is always redirected to the same server. http://blog.dovecot.org/2010/05/new-director-service-in-v20-for-nfs.html has some more details.
If you have enough connections, it shouldn't matter that the connections aren't constantly going to random backends. In practice they get distributed well enough.
OK.
One more question. Now, the director and backend server are running on the same servers, I set it up according to http://wiki2.dovecot.org/RunningDovecot#Running_Multiple_Invocations_of_Dove... . The question is how to use doveadm to manage the different instances? I know there is dovecot -c , but how about doveadm?
Timo Sirainen
<tss@iki.fi>
Sent by: To
dovecot-bounces@d Dovecot Mailing List
ovecot.org <dovecot@dovecot.org>
cc
02/09/2012 08:55 Subject
PM Re: [Dovecot] Synchronization error
in NFS
Please respond to
Dovecot Mailing
List
<dovecot@dovecot.
org>
On 9.2.2012, at 10.36, Andy YB Hu wrote:
I just tried out the Director. One question is about the re-redirection. I know director will redirect all the simultaneous requests from the same user to only a single server at the same time. The question is how to manage the time period after last connection to re-decide to redirect which machine? director_user_expire? Look like not.
I did one test, set director_user_expire = 1 min, then keep sending requests to the director in 2 min interval, the result is it keeps redirect to the same back end server.
In normal operation the user is always redirected to the same server. http://blog.dovecot.org/2010/05/new-director-service-in-v20-for-nfs.html has some more details.
If you have enough connections, it shouldn't matter that the connections aren't constantly going to random backends. In practice they get distributed well enough.
Searched the archive, and got the doveadm -a director-admin for local doveadm access, and -a host: port for remote doveadm access.
And you said: http://dovecot.org/list/dovecot/2010-July/050731.html Now in my director, I have configed the userdb passwd, but the same error occur: doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed
And in the log: Feb 10 07:45:25 auth: Error: Trying to iterate users, but userdbs don't support it
[root@c-dev1ws--01-01 conf_director.d]$dovecot -n -c /usr/local/etc/dovecot/dovecot_director.conf # 2.0.13: /usr/local/etc/dovecot/dovecot_director.conf # OS: Linux 2.6.18-274.17.1.el5 i686 Red Hat Enterprise Linux Server release 5.7 (Tikanga) auth_debug = yes auth_debug_passwords = yes auth_mechanisms = xpreauth auth_username_chars = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890.-_@: auth_verbose = yes auth_verbose_passwords = plain base_dir = /var/run/dovecot_director director_mail_servers = 9.119.7.129 director_servers = 9.119.7.60:9090 disable_plaintext_auth = no lock_method = dotlock log_path = /tmp/log mail_debug = yes mail_fsync = always mail_gid = mdrop mail_nfs_index = yes mail_nfs_storage = yes mail_uid = mdrop mmap_disable = yes passdb { args = proxy=y port=144 nopassword=y driver = static } service director { fifo_listener login/proxy-notify { mode = 0666 } inet_listener { port = 9090 } unix_listener director-userdb { mode = 0600 } unix_listener login/director { mode = 0666 } } service imap-login { executable = imap-login director inet_listener imap { address = 9.119.7.60 port = 10143 } } ssl = no userdb { driver = passwd }
By the way, I run the director and backend in the same server.
Andy YB Hu/Hong
Kong/IBM@IBMHK
Sent by: To
dovecot-bounces@d Dovecot Mailing List
ovecot.org <dovecot@dovecot.org>
cc
02/10/2012 11:12 Subject
AM Re: [Dovecot] Synchronization error
in NFS
Please respond to
Dovecot Mailing
List
<dovecot@dovecot.
org>
OK.
One more question. Now, the director and backend server are running on the same servers, I set it up according to http://wiki2.dovecot.org/RunningDovecot#Running_Multiple_Invocations_of_Dove...
. The question is how to use doveadm to manage the different instances? I know there is dovecot -c , but how about doveadm?
Timo Sirainen
<tss@iki.fi>
Sent by: To
dovecot-bounces@d Dovecot Mailing List
ovecot.org <dovecot@dovecot.org>
cc
02/09/2012 08:55 Subject
PM Re: [Dovecot] Synchronization error
in NFS
Please respond to
Dovecot Mailing
List
<dovecot@dovecot.
org>
On 9.2.2012, at 10.36, Andy YB Hu wrote:
I just tried out the Director. One question is about the re-redirection. I know director will redirect all the simultaneous requests from the same user to only a single server at the same time. The question is how to manage the time period after last connection to re-decide to redirect which machine? director_user_expire? Look like not.
I did one test, set director_user_expire = 1 min, then keep sending requests to the director in 2 min interval, the result is it keeps redirect to the same back end server.
In normal operation the user is always redirected to the same server. http://blog.dovecot.org/2010/05/new-director-service-in-v20-for-nfs.html has some more details.
If you have enough connections, it shouldn't matter that the connections aren't constantly going to random backends. In practice they get distributed well enough.
On 10.2.2012, at 10.08, Andy YB Hu wrote:
Searched the archive, and got the doveadm -a director-admin for local doveadm access, and -a host: port for remote doveadm access.
You can give -c parameter also to doveadm (and all other Dovecot programs as well).
And you said: http://dovecot.org/list/dovecot/2010-July/050731.html Now in my director, I have configed the userdb passwd, but the same error occur: doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed
And in the log: Feb 10 07:45:25 auth: Error: Trying to iterate users, but userdbs don't support it
I'm guessing that it's connecting to wrong Dovecot's auth process. Use doveadm -c instead of -a.
Another question is about the director failover. In http://blog.dovecot.org/2010/05/new-director-service-in-v20-for-nfs.html you said: "The main complexity here comes from how to handle proxy server failures in different situations. Those are less interesting to describe and I haven't yet implemented all of it, so let's just assume that in future it all works perfectly." So currently there is no good way to handle the director failover, or any 3rd party solution like poolmon by Brad Davidson for health monitoring of backend servers?
Thanks.
Timo Sirainen
<tss@iki.fi>
Sent by: To
dovecot-bounces@d Dovecot Mailing List
ovecot.org <dovecot@dovecot.org>
cc
02/12/2012 12:06 Subject
PM Re: [Dovecot] Synchronization error
in NFS
Please respond to
Dovecot Mailing
List
<dovecot@dovecot.
org>
On 10.2.2012, at 10.08, Andy YB Hu wrote:
Searched the archive, and got the doveadm -a director-admin for local doveadm access, and -a host: port for remote doveadm access.
You can give -c parameter also to doveadm (and all other Dovecot programs as well).
And you said: http://dovecot.org/list/dovecot/2010-July/050731.html Now in my director, I have configed the userdb passwd, but the same error occur: doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed
And in the log: Feb 10 07:45:25 auth: Error: Trying to iterate users, but userdbs don't support it
I'm guessing that it's connecting to wrong Dovecot's auth process. Use doveadm -c instead of -a.
On 14.2.2012, at 4.39, Andy YB Hu wrote:
Another question is about the director failover. In http://blog.dovecot.org/2010/05/new-director-service-in-v20-for-nfs.html you said: "The main complexity here comes from how to handle proxy server failures in different situations. Those are less interesting to describe and I haven't yet implemented all of it, so let's just assume that in future it all works perfectly." So currently there is no good way to handle the director failover, or any 3rd party solution like poolmon by Brad Davidson for health monitoring of backend servers?
Your load balancer handles director downtimes so it connects only to directors that are up. Directors themselves figure out when one of them is down (by either explicit disconnection or timeout). That's the theory, and so far no one's told me it doesn't work that way.
For the backend servers there's still no automation though. You'll need to explicitly tell director to stop trying to connect some specific backend. The poolmon is intended for that. It would be possible to implement this directly in the director itself, but so far it hasn't really been a priority since the companies who have paid for it have wanted to implement it internally themselves..
participants (2)
-
Andy YB Hu
-
Timo Sirainen