[Dovecot] Connection refused with auth-master after upgrading to Dovecot 1.0 rc 28
Fedora pushed out an update to Dovecot 1.0 rc 28 today. After upgrading, mail isn't delivered to local recipients. My log file is full of error messages similar to the following:
Apr 2 12:56:32 mail deliver(user@domain.com): net_connect(/var/run/dovecot/auth-master) failed: Connection refused
I'm including some snippets from my dovecot.conf file that pertain to the auth-master file:
protocol lda { # Address to use when sending rejection mails. postmaster_address = postmaster@brashers.com
mail = maildir:/home/vmail/mail/%n
# Hostname to use in various parts of sent mails, eg. in Message-Id. # Default is the system's real hostname. #hostname =
# Support for dynamically loadable plugins. mail_plugins is a space separated # list of plugins to load. #mail_plugins = #mail_plugin_dir = /usr/lib/dovecot/lda mail_plugins = cmusieve mail_plugin_dir = /usr/lib/dovecot/lda
# Binary to use for sending mails. #sendmail_path = /usr/lib/sendmail
# UNIX socket path to master authentication server to find users. auth_socket_path = /var/run/dovecot/auth-master #auth_socket_path = /home/vmail/mail/auth-master }
socket listen { master { # Master socket provides access to userdb information. It's typically # used to give Dovecot's local delivery agent access to userdb so it # can find mailbox locations. path = /var/run/dovecot/auth-master #path = /home/vmail/mail/auth-master mode = 0600 # Default user/group is the one who started dovecot-auth (root) user = vmail group = mail } #client { # The client socket is generally safe to export to everyone. Typical use # is to export it to your SMTP server so it can do SMTP AUTH lookups # using it. #path = /var/run/dovecot/auth-client #mode = 0660 #} }
I have been beating my head against this trying to change the permissions to 0777, changing the directory of the socket, changing the auth user, and just about anything else I can find in any thread that relates to this error. Any help would be greatly appreciated.
On Mon, 2007-04-02 at 13:10 -0600, Jason Warner wrote:
Fedora pushed out an update to Dovecot 1.0 rc 28 today. After upgrading, mail isn't delivered to local recipients. My log file is full of error messages similar to the following:
Apr 2 12:56:32 mail deliver(user@domain.com): net_connect(/var/run/dovecot/auth-master) failed: Connection refused
Delete this file. Restart Dovecot. Does it get recreated?
I'm including some snippets from my dovecot.conf file that pertain to the auth-master file:
dovecot -n shows what Dovecot really uses. It might show something different than what you thought you had.
Timo,
Thanks for your time.
On 4/2/07, Timo Sirainen tss@iki.fi wrote:
On Mon, 2007-04-02 at 13:10 -0600, Jason Warner wrote:
Fedora pushed out an update to Dovecot 1.0 rc 28 today. After upgrading, mail isn't delivered to local recipients. My log file is full of error messages similar to the following:
Apr 2 12:56:32 mail deliver(user@domain.com): net_connect(/var/run/dovecot/auth-master) failed: Connection refused
Delete this file. Restart Dovecot. Does it get recreated?
When I delete the file and restart Dovecot it is recreated.
I'm including some snippets from my dovecot.conf file that pertain to the auth-master file:
dovecot -n shows what Dovecot really uses. It might show something different than what you thought you had.
Here is the output from dovecot -n:
[root@mail dovecot]# /usr/local/sbin/dovecot -c /etc/dovecot.conf -n # /etc/dovecot.conf protocols: imap imaps pop3 pop3s ssl_cert_file: /etc/pki/dovecot/certs/pop3.pem ssl_key_file: /etc/pki/dovecot/private/pop3.pem login_dir: /usr/local/var/run/dovecot/login login_executable(default): /usr/local/libexec/dovecot/imap-login login_executable(imap): /usr/local/libexec/dovecot/imap-login login_executable(pop3): /usr/local/libexec/dovecot/pop3-login default_mail_env: maildir:/home/vmail/mail/%n mail_location: maildir:/home/vmail/mail/%n mail_executable(default): /usr/local/libexec/dovecot/imap mail_executable(imap): /usr/local/libexec/dovecot/imap mail_executable(pop3): /usr/local/libexec/dovecot/pop3 mail_plugin_dir(default): /usr/local/lib/dovecot/imap mail_plugin_dir(imap): /usr/local/lib/dovecot/imap mail_plugin_dir(pop3): /usr/local/lib/dovecot/pop3 pop3_uidl_format(default): pop3_uidl_format(imap): pop3_uidl_format(pop3): %08Xu%08Xv auth default: passdb: driver: ldap args: /etc/dovecot-ldap.conf userdb: driver: static args: uid=502 gid=502 home=/home/vmail/mail/%n mail=maildir:/home/vmail/mail/%n/ socket: type: listen client: master: path: /var/run/dovecot/auth-master mode: 384 user: vmail group: mail
Some other information that I've learned might be helpful in helping to troubleshoot this problem:
- I'm using Postfix and delivering mail to virtual users.
- The problem presents itself when using the Dovecot LDA. If I go back to my old virtual settings (not using the Dovecot LDA) then mail is delivered again.
- The Dovecot LDA is added with this line in my master.cf for Postfix:
Dovecot LDA
dovecot unix - n n - - pipe flags=DRhu user=vmail:mail argv=/usr/libexec/dovecot/deliver -d ${recipient}
On 3.4.2007, at 17.38, Jason Warner wrote:
Fedora pushed out an update to Dovecot 1.0 rc 28 today. After upgrading, mail isn't delivered to local recipients. My log file is full of error messages similar to the following:
Apr 2 12:56:32 mail deliver(user@domain.com): net_connect(/var/run/dovecot/auth-master) failed: Connection
refusedDelete this file. Restart Dovecot. Does it get recreated?
When I delete the file and restart Dovecot it is recreated.
Does this then happen every time when running deliver? Does netstat
show that dovecot-auth is listening in that path?
Connection refused means either that no-one is listening on the other
side, or that its listen queue got full.
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi] Sent: Friday, April 06, 2007 1:43 AM To: Jason Warner Cc: jasonw@brashers.com; dovecot@dovecot.org Subject: Re: [Dovecot] Connection refused with auth-master after upgrading to Dovecot 1.0 rc 28
On 3.4.2007, at 17.38, Jason Warner wrote:
Fedora pushed out an update to Dovecot 1.0 rc 28 today. After upgrading, mail isn't delivered to local recipients. My log file is full of error messages similar to the following:
Apr 2 12:56:32 mail deliver(user@domain.com): net_connect(/var/run/dovecot/auth-master) failed: Connection refused
Delete this file. Restart Dovecot. Does it get recreated?
When I delete the file and restart Dovecot it is recreated.
Does this then happen every time when running deliver? Does netstat show that dovecot-auth is listening in that path?
When running deliver from the command line, I do not get the error. The error only appears when deliver is run from Postfix.
Netstat shows the following when running.
unix 2 [ ACC ] STREAM LISTENING 255526 private/dovecot unix 2 [ ACC ] STREAM LISTENING 1176428 /var/run/dovecot/auth-master unix 2 [ ACC ] STREAM LISTENING 1176411 /var/run/dovecot/login/default unix 2 [ ACC ] STREAM LISTENING 1176416 /var/run/dovecot/auth-worker.27311 unix 2 [ ACC ] STREAM LISTENING 1176409 /var/run/dovecot/dict-server unix 3 [ ] STREAM CONNECTED 7888432 /var/run/dovecot/login/default unix 3 [ ] STREAM CONNECTED 7888429
Connection refused means either that no-one is listening on the other side, or that its listen queue got full.
It appears that the auth-master is running. How would I check to make sure that it is listening and responding properly to requests?
On Mon, 2007-04-09 at 08:45 -0600, Jason Warner wrote:
Apr 2 12:56:32 mail deliver(user@domain.com): net_connect(/var/run/dovecot/auth-master) failed: Connection refused
Delete this file. Restart Dovecot. Does it get recreated?
When I delete the file and restart Dovecot it is recreated.
Does this then happen every time when running deliver? Does netstat show that dovecot-auth is listening in that path?
When running deliver from the command line, I do not get the error. The error only appears when deliver is run from Postfix.
And you gave deliver -d parameter when running from command line? If it works from there but not from Postfix, then the problem has something to do with how deliver is started. Did you try running as the same user as what Postfix runs it with?
Netstat shows the following when running.
unix 2 [ ACC ] STREAM LISTENING 255526 private/dovecot unix 2 [ ACC ] STREAM LISTENING 1176428 /var/run/dovecot/auth-master
So it should be accepting connections.
It appears that the auth-master is running. How would I check to make sure that it is listening and responding properly to requests?
There really isn't any other way than to run deliver. auth_debug=yes will also log something whenever deliver is run and dovecot-auth sees something.
Timo,
I really want to thank you for taking the time to look at this. I'm a newbie with a little Linux experience, so I'm in charge of the mail server. It makes it nice to deal with someone who really knows what they are doing.
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi] Sent: Tuesday, April 10, 2007 8:46 AM To: Jason Warner Cc: 'Jason Warner'; dovecot@dovecot.org Subject: Re: [Dovecot] Connection refused with auth-master after upgradingto Dovecot 1.0 rc 28
On Mon, 2007-04-09 at 08:45 -0600, Jason Warner wrote:
Apr 2 12:56:32 mail deliver(user@domain.com): net_connect(/var/run/dovecot/auth-master) failed: Connection refused
Delete this file. Restart Dovecot. Does it get recreated?
When I delete the file and restart Dovecot it is recreated.
Does this then happen every time when running deliver? Does netstat show that dovecot-auth is listening in that path?
When running deliver from the command line, I do not get the error. The error only appears when deliver is run from Postfix.
And you gave deliver -d parameter when running from command line? If it works from there but not from Postfix, then the problem has something to do with how deliver is started. Did you try running as the same user as what Postfix runs it with?
When I run deliver with the -d parameter, it does deliver a message to the intended user. I've tried just about everything I know how to do with the Dovecot LDA in the master.cf config file for Postfix. Here is the line as it stands now:
# Dovecot LDA dovecot unix - n n - - pipe flags=DRhu user=vmail:mail argv=/usr/libexec/dovecot/deliver -d ${recipient}
That line worked just fine up until the update. After some looking, we were running 1.0 rc 15 so this step up to 28 was quite a move. Fortunately, the mail server is working with our virtual setup, but our vacation message setup relied on the sieve portions of the Dovecot LDA. Thanks again for taking a look at this.
Netstat shows the following when running.
unix 2 [ ACC ] STREAM LISTENING 255526 private/dovecot unix 2 [ ACC ] STREAM LISTENING 1176428 /var/run/dovecot/auth-master
So it should be accepting connections.
It appears that the auth-master is running. How would I check to make sure that it is listening and responding properly to requests?
There really isn't any other way than to run deliver. auth_debug=yes will also log something whenever deliver is run and dovecot-auth sees something.
On Tue, 2007-04-10 at 09:11 -0600, Jason Warner wrote:
When I run deliver with the -d parameter, it does deliver a message to the intended user. I've tried just about everything I know how to do with the Dovecot LDA in the master.cf config file for Postfix. Here is the line as it stands now:
# Dovecot LDA dovecot unix - n n - - pipe flags=DRhu user=vmail:mail argv=/usr/libexec/dovecot/deliver -d ${recipient}
So what about running deliver as vmail?
sudo su vmail -s /usr/libexec/dovecot/deliver -d some@one
What OS do you use? Do you use SELinux or anything like that?
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi] Sent: Tuesday, April 10, 2007 9:24 AM To: Jason Warner Cc: 'Jason Warner'; dovecot@dovecot.org Subject: RE: [Dovecot] Connection refused with auth-master after upgradingto Dovecot 1.0 rc 28
On Tue, 2007-04-10 at 09:11 -0600, Jason Warner wrote:
When I run deliver with the -d parameter, it does deliver a message to the intended user. I've tried just about everything I know how to do with the Dovecot LDA in the master.cf config file for Postfix. Here is the line as it stands now:
# Dovecot LDA dovecot unix - n n - - pipe flags=DRhu user=vmail:mail argv=/usr/libexec/dovecot/deliver -d ${recipient}
So what about running deliver as vmail?
sudo su vmail -s /usr/libexec/dovecot/deliver -d some@one
What OS do you use? Do you use SELinux or anything like that?
Sorry, I forgot to mention that when I tried the -d parameter, I was running as the vmail user and everything went just fine.
We're running Fedora Core 6. I've disabled SELinux on our mail box. There have been SELinux updates recently. I'm not sure if they would affect our mail server with SELinux disabled.
On Tue, 2007-04-10 at 09:32 -0600, Jason Warner wrote:
Sorry, I forgot to mention that when I tried the -d parameter, I was running as the vmail user and everything went just fine.
I'm running out of ideas. strace could be helpful. Run:
strace -o /tmp/deliver.log deliver -d ..
Both as standalone and in Postfix. What's different in the log file?
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi] Sent: Tuesday, April 10, 2007 9:36 AM To: Jason Warner Cc: 'Jason Warner'; dovecot@dovecot.org Subject: Re: [Dovecot] Connection refused with auth-master afterupgradingto Dovecot 1.0 rc 28
On Tue, 2007-04-10 at 09:32 -0600, Jason Warner wrote:
Sorry, I forgot to mention that when I tried the -d parameter, I was running as the vmail user and everything went just fine.
I'm running out of ideas. strace could be helpful. Run:
strace -o /tmp/deliver.log deliver -d ..
Both as standalone and in Postfix. What's different in the log file?
Here's a diff of the two log files:
1,3c1,3 < execve("/usr/libexec/dovecot/deliver", ["/usr/libexec/dovecot/deliver", "-d", "user1@domain.com"], [/* 26 vars */]) = 0 < brk(0) = 0x9cac000 < mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f31000
execve("/usr/libexec/dovecot/deliver", ["/usr/libexec/dovecot/deliver", "-d", "user2@domain.com"], [/* 3 vars */]) = 0 brk(0) = 0x8192000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f4b000 7c7 < mmap2(NULL, 102341, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f18000
mmap2(NULL, 102341, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f32000 22,23c22,23 < mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f17000 < set_thread_area({entry_number:-1 -> 6, base_addr:0xb7f176c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f31000 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7f316c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 27,30c27,30 < munmap(0xb7f18000, 102341) = 0 < time(NULL) = 1176220958 < brk(0) = 0x9cac000 < brk(0x9cd5000) = 0x9cd5000
munmap(0xb7f32000, 102341) = 0 time(NULL) = 1176221679 brk(0) = 0x8192000 brk(0x81bb000) = 0x81bb000 32,33c32,33 < getpid() = 31531 < gettimeofday({1176220958, 974801}, {360, 0}) = 0
getpid() = 1143 gettimeofday({1176221679, 229181}, {360, 0}) = 0 96,106c96,106 < _llseek(6, 0, 0xbfa70aa0, SEEK_CUR) = -1 ESPIPE (Illegal seek) < getsockname(6, {sa_family=AF_FILE, path="Ã "}, [2]) = 0 < write(6, "VERSION\t1\t0\nUSER\t1\tuser1@domain"..., 55) = 55
At this line, before changing the domain, it was a partial of my domain name
< gettimeofday({1176220958, 988500}, NULL) = 0 < poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=6, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL, revents=POLLIN}], 2, 59986) = 1 < gettimeofday({1176220958, 988718}, {360, 0}) = 0 < read(6, "VERSION\t1\t0\nSPID\t28471\n", 4096) = 23 < gettimeofday({1176220958, 988913}, NULL) = 0 < poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=6, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL, revents=POLLIN|POLLHUP}], 2, 59986) = 1 < gettimeofday({1176220959, 11220}, {360, 0}) = 0 < read(6, "", 4073) = 0
_llseek(6, 0, 0xbf9345b0, SEEK_CUR) = -1 ESPIPE (Illegal seek) getsockname(6, {sa_family=AF_FILE, path="^Y^H"}, [2]) = 0 write(6, "VERSION\t1\t0\nUSER\t1\tuser2@domain"..., 54) = 54
Again, it was a partial before changing the domain.
- = 1
gettimeofday({1176221679, 242218}, {360, 0}) = 0 read(6, "", 4074) = 0
gettimeofday({1176221679, 234529}, NULL) = 0 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=6, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL, revents=POLLIN}], 2, 59995) = 1 gettimeofday({1176221679, 234669}, {360, 0}) = 0 read(6, "VERSION\t1\t0\nSPID\t1141\n", 4096) = 22 gettimeofday({1176221679, 234825}, NULL) = 0 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=6, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL, revents=POLLIN|POLLHUP}], 2,
The first file was run as the vmail user from the command line with the -d. The second was run from Postfix.
An interesting thing about the Postfix bug is that I now get these lines in the log:
Apr 10 10:12:14 mail postfix/smtpd[562]: A7F1B168734: client=mail.domain.com[127.0.0.1] Apr 10 10:12:14 mail postfix/cleanup[496]: A7F1B168734: message-id=20070410161207.E4B351686E4@mail.domain.com Apr 10 10:12:14 mail postfix/qmgr[452]: A7F1B168734: from=user@domain.com, size=2600, nrcpt=1 (queue active) Apr 10 10:12:14 mail postfix/smtp[464]: E4B351686E4: to=user@domain.com, relay=127.0.0.1[127.0.0.1]:10024, delay=7, delays=0.22/0/0.01/6.8, dsn=2.6.0, status=sent (250 2.6.0 Ok, id=31657-06, from MTA: 250 2.0.0 Ok: queued as A7F1B168734) Apr 10 10:12:15 mail postfix/pipe[563]: A7F1B168734: to=user@domain.com, relay=dovecot, delay=0.75, delays=0.24/0.06/0/0.45, dsn=4.3.0, status=deferred (temporary failure) Apr 10 10:13:55 mail postfix/qmgr[864]: A7F1B168734: from=user@domain.com, size=2600, nrcpt=1 (queue active) Apr 10 10:13:55 mail postfix/pipe[890]: A7F1B168734: to=user@domain.com, relay=dovecot, delay=101, delays=101/0.02/0/0.11, dsn=2.0.0, status=sent (delivered via dovecot service) Apr 10 10:13:55 mail postfix/qmgr[864]: A7F1B168734: removed
The weird thing is that after the mail claims to have a temporary failure, it then says it is delivered, but I can't seem to find the mail on my server. It just disappears.
On 10.4.2007, at 19.30, Jason Warner wrote:
< write(6, "VERSION\t1\t0\nUSER\t1\tuser1@domain"..., 55) = 55 ..
write(6, "VERSION\t1\t0\nUSER\t1\tuser2@domain"..., 54) = 54
Your original error was: "net_connect(/var/run/dovecot/auth-master)
failed: Connection refused". However here both of them were
successfully connected to, and this shows that the authentication
handshake was successfully sent. So I think it wrote a different
error message to Dovecot's logs this time.
On 10.4.2007, at 19.30, Jason Warner wrote:
< write(6, "VERSION\t1\t0\nUSER\t1\tuser1@domain"..., 55) = 55 ..
write(6, "VERSION\t1\t0\nUSER\t1\tuser2@domain"..., 54) = 54
Your original error was: "net_connect(/var/run/dovecot/auth-master) failed: Connection refused". However here both of them were successfully connected to, and this shows that the authentication handshake was successfully sent. So I think it wrote a different error message to Dovecot's logs this time.
I'm afraid that I've tampered with stuff so much on this server that things have changed from when I originally posted.
Apr 10 10:12:14 mail postfix/smtpd[562]: A7F1B168734: client=mail.domain.com[127.0.0.1] Apr 10 10:12:14 mail postfix/cleanup[496]: A7F1B168734: message-id=20070410161207.E4B351686E4@mail.domain.com Apr 10 10:12:14 mail postfix/qmgr[452]: A7F1B168734: from=user@domain.com, size=2600, nrcpt=1 (queue active) Apr 10 10:12:14 mail postfix/smtp[464]: E4B351686E4: to=user@domain.com, relay=127.0.0.1[127.0.0.1]:10024, delay=7, delays=0.22/0/0.01/6.8, dsn=2.6.0, status=sent (250 2.6.0 Ok, id=31657-06, from MTA: 250 2.0.0 Ok: queued as A7F1B168734) Apr 10 10:12:15 mail dovecot: auth(default): file auth-request.c: line 474 (auth_request_lookup_credentials_callback): assertion failed: (request->state == AUTH_REQUEST_STATE_PASSDB) Apr 10 10:12:15 mail dovecot: auth(default): Raw backtrace: dovecot-auth [0x806bb91] -> dovecot-auth [0x806baac] -> dovecot-auth [0x80552ce] -> dovecot-auth [0x805f2bd] -> dovecot-auth [0x805f684] -> dovecot-auth [0x805906b] -> dovecot-auth(io_loop_handler_run+0x110) [0x806ef30] -> dovecot-auth(io_loop_run+0x1c) [0x806e27c] -> dovecot-auth(main+0x2fe) [0x805a51e] -> /lib/libc.so.6(__libc_start_main+0xdc) [0x673f2c] -> dovecot-auth {0x8050ba1] Apr 10 10:12:15 mail dovecot: child 31532 (auth) killed with signal 6 Apr 10 10:12:15 mail postfix/pipe[563]: A7F1B168734: to=user@domain.com, relay=dovecot, delay=0.75, delays=0.24/0.06/0/0.45, dsn=4.3.0, status=deferred (temporary failure)
This looks like the auth fails due to some error, but I'm not sure what.
Apr 10 10:13:55 mail postfix/qmgr[864]: A7F1B168734: from=user@domain.com, size=2600, nrcpt=1 (queue active) Apr 10 10:13:55 mail postfix/pipe[890]: A7F1B168734: to=user@domain.com, relay=dovecot, delay=101, delays=101/0.02/0/0.11, dsn=2.0.0, status=sent (delivered via dovecot service) Apr 10 10:13:55 mail postfix/qmgr[864]: A7F1B168734: removed
At this point, it looks like there was a retry and the dovecot service delivers the mail, but the mail disappears into thin air. The problem is that I have a ton of mail flooding the server and errors are overwritten before I can take a look at them. I wasn't even able to compare the same mail being sent in the delivery logs because by the time I stopped the logging more mail had come into the system.
On 10.4.2007, at 20.22, Jason Warner wrote:
Apr 10 10:12:15 mail dovecot: auth(default): file auth-request.c:
line 474 (auth_request_lookup_credentials_callback): assertion failed: (request->state == AUTH_REQUEST_STATE_PASSDB)
Oh. So the connection refused happens because dovecot-auth crashes.
Now this is beginning to make sense. The easiest way to fix this is
to add allow_all_users to userdb static's args. I'll try to figure
out why this crash is happening.
At this point, it looks like there was a retry and the dovecot service delivers the mail, but the mail disappears into thin air. The
problem is that I have a ton of mail flooding the server and errors are
overwritten before I can take a look at them. I wasn't even able to compare the
same mail being sent in the delivery logs because by the time I stopped the logging more mail had come into the system.
A test system would be good. :)
On 10.4.2007, at 20.35, Timo Sirainen wrote:
On 10.4.2007, at 20.22, Jason Warner wrote:
Apr 10 10:12:15 mail dovecot: auth(default): file auth-request.c:
line 474 (auth_request_lookup_credentials_callback): assertion failed: (request->state == AUTH_REQUEST_STATE_PASSDB)Oh. So the connection refused happens because dovecot-auth crashes.
Now this is beginning to make sense. The easiest way to fix this is
to add allow_all_users to userdb static's args. I'll try to figure
out why this crash is happening.
Do you have core file in Dovecot's base_dir (/var/run/dovecot/ or / usr/local/var/run/dovecot/ probably)? If so, could you do:
gdb /path/to/dovecot-auth /path/to/core bt full fr 4 p *request
(I think fr 4 is correct to give a usable output for p *request, if
it just says "No symbol" try with different fr numbers. It's anyway
the one that bt full shows auth_request_lookup_credentials_callback()
being in)
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi] Sent: Tuesday, April 10, 2007 11:45 AM To: Dovecot Mailing List Cc: Jason Warner; 'Jason Warner' Subject: Re: [Dovecot] Connection refused with auth-master afterupgradingto Dovecot 1.0 rc 28
On 10.4.2007, at 20.35, Timo Sirainen wrote:
On 10.4.2007, at 20.22, Jason Warner wrote:
Apr 10 10:12:15 mail dovecot: auth(default): file auth-request.c: line 474 (auth_request_lookup_credentials_callback): assertion failed: (request->state == AUTH_REQUEST_STATE_PASSDB)
Oh. So the connection refused happens because dovecot-auth crashes. Now this is beginning to make sense. The easiest way to fix this is to add allow_all_users to userdb static's args. I'll try to figure out why this crash is happening.
Adding allow_all_users to my userdb static args gets rid of the previous error message, but now the logs contain this message:
Apr 10 11:48:28 mail deliver(user@domain.com): setgid(999) failed: Operation not permitted Apr 10 11:48:28 mail postfix/pipe[17576]: 95FC916872D: to=user@domain.com, relay=dovecot, delay=0.31, delays=0.26/0.02/0/0.02, dsn=2.0.0, status=sent (delivered via dovecot service)
999 is the gid of my vmail user.
Do you have core file in Dovecot's base_dir (/var/run/dovecot/ or / usr/local/var/run/dovecot/ probably)? If so, could you do:
gdb /path/to/dovecot-auth /path/to/core bt full fr 4 p *request
(I think fr 4 is correct to give a usable output for p *request, if it just says "No symbol" try with different fr numbers. It's anyway the one that bt full shows auth_request_lookup_credentials_callback() being in)
I couldn't find a core dump file in the base_dir (/var/run/dovecot). Is there a way to force a core dump?
On 10.4.2007, at 21.00, Jason Warner wrote:
Apr 10 11:48:28 mail deliver(user@domain.com): setgid(999) failed:
Operation not permitted .. 999 is the gid of my vmail user. .. dovecot unix - n n - - pipe flags=DRhu user=vmail:mail argv=/usr/libexec/dovecot/deliver -d
So then you should use vmail:vmail here.
I couldn't find a core dump file in the base_dir (/var/run/ dovecot). Is there a way to force a core dump?
ulimit -c unlimited
in the same shell before running dovecot should do it.
Apr 10 11:48:28 mail deliver(user@domain.com): setgid(999) failed: Operation not permitted .. 999 is the gid of my vmail user. .. dovecot unix - n n - - pipe flags=DRhu user=vmail:mail argv=/usr/libexec/dovecot/deliver -d
So then you should use vmail:vmail here.
That did it. Thank you so much for your help.
I couldn't find a core dump file in the base_dir (/var/run/ dovecot). Is there a way to force a core dump?
ulimit -c unlimited
in the same shell before running dovecot should do it.
I still didn't get a core dump in /var/run/dovecot. Should I be looking somewhere else? That is the directory that the base_dir variable is set to in my dovecot.conf file.
I tried this with both the allow_all_users set and without it set. That was the fix though. By adding allow_all_users to my userdb static args, everything is running just fine.
On 10.4.2007, at 21.33, Jason Warner wrote:
I still didn't get a core dump in /var/run/dovecot. Should I be
looking somewhere else? That is the directory that the base_dir variable is
set to in my dovecot.conf file.
Well, another way would be to attach gdb directly into dovecot-auth
while it's still running:
gdb attach pidof dovecot-auth
cont
(wait for crash)
bt full
fr 4
p *request
Anyway I should be able to figure out the assert by looking at the
code too. Just a bit more difficult, since a couple of minutes wasn't
enough. :)
I tried this with both the allow_all_users set and without it set.
That was the fix though. By adding allow_all_users to my userdb static args, everything is running just fine.
Only problem with that is that it doesn't know if the user exists or
not. So depending on how you've set up permissions, it's possible
that it delivers mails to unknown users as well, creating the
directories if needed..
On 10.4.2007, at 21.33, Jason Warner wrote:
I still didn't get a core dump in /var/run/dovecot. Should I be looking somewhere else? That is the directory that the base_dir variable is set to in my dovecot.conf file.
Well, another way would be to attach gdb directly into dovecot-auth while it's still running:
gdb attach
pidof dovecot-auth
cont (wait for crash) bt full fr 4 p *requestAnyway I should be able to figure out the assert by looking at the code too. Just a bit more difficult, since a couple of minutes wasn't enough. :)
Here's the best I could do:
(gdb) bt full #0 0x004b9402 in __kernel_vsyscall () No symbol table info available. #1 0x00686d40 in raise () from /lib/libc.so.6 No symbol table info available. #2 0x00688591 in abort () from /lib/libc.so.6 No symbol table info available. #3 0x0806bb9a in i_error () No symbol table info available. #4 0x0806baac in i_panic () No symbol table info available. #5 0x080552ce in auth_request_lookup_credentials_callback () No symbol table info available. #6 0x0805f2bd in passdb_cache_init () No symbol table info available. #7 0x0805f684 in passdb_cache_init () No symbol table info available. #8 0x0805906b in db_ldap_connect () No symbol table info available. #9 0x0806ef30 in io_loop_handler_run () No symbol table info available. #10 0x0806e27c in io_loop_run () No symbol table info available. #11 0x0805a51e in main () No symbol table info available.
It looks like a stack trace, but the p *request just gives "No symbol table info available" for all 11 integers.
I tried this with both the allow_all_users set and without it set. That was the fix though. By adding allow_all_users to my userdb static args, everything is running just fine.
Only problem with that is that it doesn't know if the user exists or not. So depending on how you've set up permissions, it's possible that it delivers mails to unknown users as well, creating the directories if needed..
That is a problem. I'll go back to Postfix's virtual delivery method until I don't have to use the allow_all_users flag.
On Tue, 2007-04-10 at 13:12 -0600, Jason Warner wrote:
Anyway I should be able to figure out the assert by looking at the code too. Just a bit more difficult, since a couple of minutes wasn't enough. :)
This should fix it: http://dovecot.org/list/dovecot-cvs/2007-April/008653.html
Thank you Timo. You have been extremely helpful in this.
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi] Sent: Thursday, April 12, 2007 10:39 AM To: Jason Warner Cc: 'Jason Warner'; 'Dovecot Mailing List' Subject: Re: [Dovecot] Connection refused with auth-master afterupgradingtoDovecot 1.0 rc 28
On Tue, 2007-04-10 at 13:12 -0600, Jason Warner wrote:
Anyway I should be able to figure out the assert by looking at the code too. Just a bit more difficult, since a couple of minutes wasn't enough. :)
This should fix it: http://dovecot.org/list/dovecot-cvs/2007-April/008653.html
participants (3)
-
Jason Warner
-
Jason Warner
-
Timo Sirainen