[Dovecot] Synchronization error in NFS

Andy YB Hu

7 Feb 2012 7 Feb '12

8:26 a.m.

Hi there,

Not sure whether it's a bug or env setting issue.

I am running some concurrent testings under NFS.

The dovecot (version 2.0.13) are deployed in 2 servers (same setup), whose maildir are on the file server via NFS. Here are the output of dovecot -n: [root@MyMachine src]$dovecot -n

2.0.13: /usr/local/etc/dovecot/dovecot.conf

OS: Linux 2.6.18-274.3.1.el5 i686 Red Hat Enterprise Linux Server release

5.7 (Tikanga) nfs auth_anonymous_username = andy auth_debug = yes auth_debug_passwords = yes auth_mechanisms = anonymous plain auth_verbose = yes auth_verbose_passwords = plain disable_plaintext_auth = no lock_method = dotlock log_path = /tmp/log mail_debug = yes mail_fsync = always mail_gid = andy mail_location = maildir:/tmp/NFS mail_nfs_index = yes mail_nfs_storage = yes mail_uid = andy mmap_disable = yes passdb { driver = pam } ssl = no userdb { driver = passwd }

Here are what I am doing: One session running loop of COPY commands (while(1) COPY...) connects to one dovecot server; The other session running loop of SELECT commands (while(1) SELECT...) connects to the other dovecot server. Both are accessing the same mail box (/tmp/NFS);

After some while (not accurate duration, maybe 2 seconds, or 1 min), I found the number of EXISTS returned from SELECT command is not correct(less than the real number). Then I stop the both sessions. In the /tmp/log: Feb 07 03:44:59 imap(andy): Error: Corrupted transaction log file /tmp/NFS/dovecot.index.log seq 2: Unexpected garbage at EOF (sync_offset=2204) Feb 07 03:44:59 imap(andy): Error: Index /tmp/NFS/dovecot.index: Lost log for seq=2 offset=2204 Feb 07 03:44:59 imap(andy): Warning: fscking index file /tmp/NFS/dovecot.index

Then I tried to dump the index, sometime it's failed because of the index corruption. If not, it says: [root@MyMachine src]$doveadm dump /tmp/NFS > /tmp/dump ; vi /tmp/dump doveadm(root): Error: Log synchronization error at seq=2,offset=744 for /tmp/NFS/dovecot.index: Broken extension introduction: Headersize too large (2273345664) doveadm(root): Warning: fscking index file /tmp/NFS/dovecot.index doveadm(root): Error: fcntl(write-lock) locking failed for file /tmp/NFS/dovecot.index.log: Bad file descriptor doveadm(root): Error: mail_index_wait_lock_fd() failed with file /tmp/NFS/dovecot.index.log: Bad file descriptor doveadm(root): Error: Log synchronization error at seq=2,offset=744 for /tmp/NFS/dovecot.index: Broken extension introduction: Headersize too large (2273345664) doveadm(root): Warning: fscking index file /tmp/NFS/dovecot.index doveadm(root): Error: fcntl(write-lock) locking failed for file /tmp/NFS/dovecot.index.log: Bad file descriptor doveadm(root): Error: mail_index_wait_lock_fd() failed with file /tmp/NFS/dovecot.index.log: Bad file descriptor

And sometime from the dump output, in the RECORD part: -- RECORDS: 5 RECORD: seq=1, uid=1, flags=0x00

RECORD: seq=2, uid=2, flags=0x00

RECORD: seq=3, uid=3, flags=0x00

RECORD: seq=4, uid=4, flags=0x00

RECORD: seq=5, uid=6, flags=0x00

The uid 5 is missed. but in uidlist file, it's there..

Here are all what I found. If you need additional information, pls let me know.

The clock on the 3 machines are synchronized.

You can also reproduce it if the 2 sessions are APPEND and SELECT.

If both sessions are running towards the same dovecot server, even the maildir are on the NFS, it works very well without any error.

Show replies by date

Timo Sirainen

9 Feb 9 Feb

1:49 a.m.

On 7.2.2012, at 8.26, Andy YB Hu wrote:

...

I am running some concurrent testings under NFS. .. Here are what I am doing: One session running loop of COPY commands (while(1) COPY...) connects to one dovecot server; The other session running loop of SELECT commands (while(1) SELECT...) connects to the other dovecot server. Both are accessing the same mail box (/tmp/NFS);

I don't even attempt to support this kind of configuration anymore. Use http://wiki2.dovecot.org/Director

Andy YB Hu

10:36 a.m.

Thanks Timo,

I just tried out the Director. One question is about the re-redirection. I know director will redirect all the simultaneous requests from the same user to only a single server at the same time. The question is how to manage the time period after last connection to re-decide to redirect which machine? director_user_expire? Look like not.

I did one test, set director_user_expire = 1 min, then keep sending requests to the director in 2 min interval, the result is it keeps redirect to the same back end server.

Actually what i want is the "secondary load balancer layer" can redirect requests to random back end. How to manage it? Only after the files on the previous back end is expired?

Thanks.

         Timo Sirainen                                                 
         &lt;tss@iki.fi>                                                  
         Sent by:                                                   To 
         dovecot-bounces@d         Dovecot Mailing List                
         ovecot.org                &lt;dovecot@dovecot.org>               
                                                                    cc 
                                                                       
         02/09/2012 07:49                                      Subject 
         AM                        Re: [Dovecot] Synchronization error 
                                   in NFS                              
                                                                       
         Please respond to                                             
          Dovecot Mailing                                              
               List                                                    
         &lt;dovecot@dovecot.                                             
               org>

On 7.2.2012, at 8.26, Andy YB Hu wrote:

...

I am running some concurrent testings under NFS. .. Here are what I am doing: One session running loop of COPY commands (while(1) COPY...) connects to one dovecot server; The other session running loop of SELECT commands (while(1) SELECT...) connects to the other dovecot server. Both are accessing the same mail box (/tmp/NFS);

I don't even attempt to support this kind of configuration anymore. Use http://wiki2.dovecot.org/Director

Timo Sirainen

2:55 p.m.

On 9.2.2012, at 10.36, Andy YB Hu wrote:

...

I just tried out the Director. One question is about the re-redirection. I know director will redirect all the simultaneous requests from the same user to only a single server at the same time. The question is how to manage the time period after last connection to re-decide to redirect which machine? director_user_expire? Look like not.

I did one test, set director_user_expire = 1 min, then keep sending requests to the director in 2 min interval, the result is it keeps redirect to the same back end server.

In normal operation the user is always redirected to the same server. http://blog.dovecot.org/2010/05/new-director-service-in-v20-for-nfs.html has some more details.

If you have enough connections, it shouldn't matter that the connections aren't constantly going to random backends. In practice they get distributed well enough.

Andy YB Hu

10 Feb 10 Feb

5:12 a.m.

OK.

One more question. Now, the director and backend server are running on the same servers, I set it up according to http://wiki2.dovecot.org/RunningDovecot#Running_Multiple_Invocations_of_Dove... . The question is how to use doveadm to manage the different instances? I know there is dovecot -c , but how about doveadm?

         Timo Sirainen                                                 
         &lt;tss@iki.fi>                                                  
         Sent by:                                                   To 
         dovecot-bounces@d         Dovecot Mailing List                
         ovecot.org                &lt;dovecot@dovecot.org>               
                                                                    cc 
                                                                       
         02/09/2012 08:55                                      Subject 
         PM                        Re: [Dovecot] Synchronization error 
                                   in NFS                              
                                                                       
         Please respond to                                             
          Dovecot Mailing                                              
               List                                                    
         &lt;dovecot@dovecot.                                             
               org>

On 9.2.2012, at 10.36, Andy YB Hu wrote:

...

I just tried out the Director. One question is about the re-redirection. I know director will redirect all the simultaneous requests from the same user to only a single server at the same time. The question is how to manage the time period after last connection to re-decide to redirect which machine? director_user_expire? Look like not.

I did one test, set director_user_expire = 1 min, then keep sending requests to the director in 2 min interval, the result is it keeps redirect to the same back end server.

In normal operation the user is always redirected to the same server. http://blog.dovecot.org/2010/05/new-director-service-in-v20-for-nfs.html has some more details.

If you have enough connections, it shouldn't matter that the connections aren't constantly going to random backends. In practice they get distributed well enough.

Andy YB Hu

10:08 a.m.

Searched the archive, and got the doveadm -a director-admin for local doveadm access, and -a host: port for remote doveadm access.

And you said: http://dovecot.org/list/dovecot/2010-July/050731.html Now in my director, I have configed the userdb passwd, but the same error occur: doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed

And in the log: Feb 10 07:45:25 auth: Error: Trying to iterate users, but userdbs don't support it

[root@c-dev1ws--01-01 conf_director.d]$dovecot -n -c /usr/local/etc/dovecot/dovecot_director.conf

2.0.13: /usr/local/etc/dovecot/dovecot_director.conf

OS: Linux 2.6.18-274.17.1.el5 i686 Red Hat Enterprise Linux Server

release 5.7 (Tikanga) auth_debug = yes auth_debug_passwords = yes auth_mechanisms = xpreauth auth_username_chars = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890.-_@: auth_verbose = yes auth_verbose_passwords = plain base_dir = /var/run/dovecot_director director_mail_servers = 9.119.7.129 director_servers = 9.119.7.60:9090 disable_plaintext_auth = no lock_method = dotlock log_path = /tmp/log mail_debug = yes mail_fsync = always mail_gid = mdrop mail_nfs_index = yes mail_nfs_storage = yes mail_uid = mdrop mmap_disable = yes passdb { args = proxy=y port=144 nopassword=y driver = static } service director { fifo_listener login/proxy-notify { mode = 0666 } inet_listener { port = 9090 } unix_listener director-userdb { mode = 0600 } unix_listener login/director { mode = 0666 } } service imap-login { executable = imap-login director inet_listener imap { address = 9.119.7.60 port = 10143 } } ssl = no userdb { driver = passwd }

By the way, I run the director and backend in the same server.

         Andy YB Hu/Hong                                               
         Kong/IBM@IBMHK                                                
         Sent by:                                                   To 
         dovecot-bounces@d         Dovecot Mailing List                
         ovecot.org                &lt;dovecot@dovecot.org>               
                                                                    cc 
                                                                       
         02/10/2012 11:12                                      Subject 
         AM                        Re: [Dovecot] Synchronization error 
                                   in NFS                              
                                                                       
         Please respond to                                             
          Dovecot Mailing                                              
               List                                                    
         &lt;dovecot@dovecot.                                             
               org>

OK.

One more question. Now, the director and backend server are running on the same servers, I set it up according to http://wiki2.dovecot.org/RunningDovecot#Running_Multiple_Invocations_of_Dove...

. The question is how to use doveadm to manage the different instances? I know there is dovecot -c , but how about doveadm?

         Timo Sirainen
         &lt;tss@iki.fi>
         Sent by:                                                   To
         dovecot-bounces@d         Dovecot Mailing List
         ovecot.org                &lt;dovecot@dovecot.org>
                                                                    cc

         02/09/2012 08:55                                      Subject
         PM                        Re: [Dovecot] Synchronization error
                                   in NFS

         Please respond to
          Dovecot Mailing
               List
         &lt;dovecot@dovecot.
               org>

On 9.2.2012, at 10.36, Andy YB Hu wrote:

...

I just tried out the Director. One question is about the re-redirection. I know director will redirect all the simultaneous requests from the same user to only a single server at the same time. The question is how to manage the time period after last connection to re-decide to redirect which machine? director_user_expire? Look like not.

I did one test, set director_user_expire = 1 min, then keep sending requests to the director in 2 min interval, the result is it keeps redirect to the same back end server.

In normal operation the user is always redirected to the same server. http://blog.dovecot.org/2010/05/new-director-service-in-v20-for-nfs.html has some more details.

If you have enough connections, it shouldn't matter that the connections aren't constantly going to random backends. In practice they get distributed well enough.

Timo Sirainen

12 Feb 12 Feb

6:06 a.m.

On 10.2.2012, at 10.08, Andy YB Hu wrote:

...

Searched the archive, and got the doveadm -a director-admin for local doveadm access, and -a host: port for remote doveadm access.

You can give -c parameter also to doveadm (and all other Dovecot programs as well).

...

And you said: http://dovecot.org/list/dovecot/2010-July/050731.html Now in my director, I have configed the userdb passwd, but the same error occur: doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed

And in the log: Feb 10 07:45:25 auth: Error: Trying to iterate users, but userdbs don't support it

I'm guessing that it's connecting to wrong Dovecot's auth process. Use doveadm -c instead of -a.

Andy YB Hu

14 Feb 14 Feb

4:39 a.m.

Another question is about the director failover. In http://blog.dovecot.org/2010/05/new-director-service-in-v20-for-nfs.html you said: "The main complexity here comes from how to handle proxy server failures in different situations. Those are less interesting to describe and I haven't yet implemented all of it, so let's just assume that in future it all works perfectly." So currently there is no good way to handle the director failover, or any 3rd party solution like poolmon by Brad Davidson for health monitoring of backend servers?

Thanks.

         Timo Sirainen                                                 
         &lt;tss@iki.fi>                                                  
         Sent by:                                                   To 
         dovecot-bounces@d         Dovecot Mailing List                
         ovecot.org                &lt;dovecot@dovecot.org>               
                                                                    cc 
                                                                       
         02/12/2012 12:06                                      Subject 
         PM                        Re: [Dovecot] Synchronization error 
                                   in NFS                              
                                                                       
         Please respond to                                             
          Dovecot Mailing                                              
               List                                                    
         &lt;dovecot@dovecot.                                             
               org>

On 10.2.2012, at 10.08, Andy YB Hu wrote:

...

Searched the archive, and got the doveadm -a director-admin for local doveadm access, and -a host: port for remote doveadm access.

You can give -c parameter also to doveadm (and all other Dovecot programs as well).

...

And you said: http://dovecot.org/list/dovecot/2010-July/050731.html Now in my director, I have configed the userdb passwd, but the same error occur: doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed

And in the log: Feb 10 07:45:25 auth: Error: Trying to iterate users, but userdbs don't support it

I'm guessing that it's connecting to wrong Dovecot's auth process. Use doveadm -c instead of -a.

Timo Sirainen

4:46 a.m.

On 14.2.2012, at 4.39, Andy YB Hu wrote:

...

Another question is about the director failover. In http://blog.dovecot.org/2010/05/new-director-service-in-v20-for-nfs.html you said: "The main complexity here comes from how to handle proxy server failures in different situations. Those are less interesting to describe and I haven't yet implemented all of it, so let's just assume that in future it all works perfectly." So currently there is no good way to handle the director failover, or any 3rd party solution like poolmon by Brad Davidson for health monitoring of backend servers?

Your load balancer handles director downtimes so it connects only to directors that are up. Directors themselves figure out when one of them is down (by either explicit disconnection or timeout). That's the theory, and so far no one's told me it doesn't work that way.

For the backend servers there's still no automation though. You'll need to explicitly tell director to stop trying to connect some specific backend. The poolmon is intended for that. It would be possible to implement this directly in the director itself, but so far it hasn't really been a priority since the companies who have paid for it have wanted to implement it internally themselves..

4903

Age (days ago)

4910

Last active (days ago)

List overview

8 comments

2 participants

participants (2)

Andy YB Hu
Timo Sirainen

[Dovecot] Synchronization error in NFS

Andy YB Hu

2.0.13: /usr/local/etc/dovecot/dovecot.conf

OS: Linux 2.6.18-274.3.1.el5 i686 Red Hat Enterprise Linux Server release

Timo Sirainen

Andy YB Hu

Timo Sirainen

Andy YB Hu

Andy YB Hu

2.0.13: /usr/local/etc/dovecot/dovecot_director.conf

OS: Linux 2.6.18-274.17.1.el5 i686 Red Hat Enterprise Linux Server

Timo Sirainen

Andy YB Hu

Timo Sirainen

tags

participants (2)