My auth process is dumping core. This happens several times per day but dovecot can operate normally for hours between errors.
The crash occurs in src/auth/auth-policy.c, line 356:
t@1 (l@1) program terminated by signal SEGV (no mapping at the fault address) Current function is auth_policy_parse_response 356 context->request->policy_refusal = FALSE;
...context->request is null. Add markers to the code...
}
if (context->request == NULL) fprintf(stderr,
"2222222222222222222222222222 context->request == NULL\n"); i_stream_unref(&context->payload);
if (context->request == NULL) fprintf(stderr,
"1111111111111111111111111111 context->request == NULL\n"); if (context->parse_error) { context->result = (context->set->policy_reject_on_fail ? -1 : 0); }
if (context->request == NULL) fprintf(stderr,
"AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL\n"); context->request->policy_refusal = FALSE;
if (context->result < 0) {
...gives at the time of the crash...
Aug 1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: 1111111111111111111111111111 context->request == NULL Aug 1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL
...so context->result is not null before the call (no 222) to i_stream_unref but is after.
dovecot.conf has:
auth_policy_server_url = http://policyserver.lan/ auth_policy_server_timeout_msecs = 3000 auth_policy_hash_nonce = Ohr9phaeSeip2Pahaez2raiGohxoo5Ia auth_policy_request_attributes = remote=%{rip} auth_policy_check_before_auth = yes auth_policy_check_after_auth = yes auth_policy_report_after_auth = yes
To simplify the problem I used a dummy policy server, in nginx.conf:
location / {
default_type application/json;
return 200 "{\"status\":0,\"msg\":\"accepted\"}";
}
however no matter what rubbish a policy server sends back it should not cause dovecot to crash.
I've tried 32 and 64 bit and two compilers (gcc and SunStudio), all result in crashes. Adding: keepalive_timeout 0; ...to nginx.conf appears to reduce the crashes. It happens with a variety of users and with debug output I see no pattern.
James.
On 02/08/2019 11:45, James via dovecot wrote:
My auth process is dumping core. This happens several times per day but dovecot can operate normally for hours between errors.
The crash occurs in src/auth/auth-policy.c, line 356:
t@1 (l@1) program terminated by signal SEGV (no mapping at the fault address) Current function is auth_policy_parse_response 356 context->request->policy_refusal = FALSE;
Further tracking shows this sets context->request to NULL:
"src/lib/iostream.c" line 54
array_foreach(&stream->destroy_callbacks, dc)
dc->callback(dc->context);
Very occasionally I see:
Aug 3 11:00:35 mailhost dovecot: [ID 702911 mail.crit] auth: Panic: file http-client-request.c: line 283 (http_client_request_unref): assertion failed: (req->refcount > 0)
Swapping keep-alive on/off changes crashing from very approximately once per day to some per hour. I guess there is some fundamental thread clash or keep alive time out clean-up failure.
James.
On 2.8.2019 13.45, James via dovecot wrote:
My auth process is dumping core. This happens several times per day but dovecot can operate normally for hours between errors.
The crash occurs in src/auth/auth-policy.c, line 356:
t@1 (l@1) program terminated by signal SEGV (no mapping at the fault address) Current function is auth_policy_parse_response 356 context->request->policy_refusal = FALSE;
...context->request is null. Add markers to the code...
} if (context->request == NULL) fprintf(stderr, "2222222222222222222222222222 context->request == NULL\n"); i_stream_unref(&context->payload);
if (context->request == NULL) fprintf(stderr, "1111111111111111111111111111 context->request == NULL\n"); if (context->parse_error) { context->result = (context->set->policy_reject_on_fail ? -1 : 0); }
if (context->request == NULL) fprintf(stderr, "AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL\n"); context->request->policy_refusal = FALSE;
if (context->result < 0) {
...gives at the time of the crash...
Aug 1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: 1111111111111111111111111111 context->request == NULL Aug 1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL
...so context->result is not null before the call (no 222) to i_stream_unref but is after.
dovecot.conf has:
auth_policy_server_url = http://policyserver.lan/ auth_policy_server_timeout_msecs = 3000 auth_policy_hash_nonce = Ohr9phaeSeip2Pahaez2raiGohxoo5Ia auth_policy_request_attributes = remote=%{rip} auth_policy_check_before_auth = yes auth_policy_check_after_auth = yes auth_policy_report_after_auth = yes
To simplify the problem I used a dummy policy server, in nginx.conf:
location / { default_type application/json; return 200 "{\"status\":0,\"msg\":\"accepted\"}"; }
however no matter what rubbish a policy server sends back it should not cause dovecot to crash.
I've tried 32 and 64 bit and two compilers (gcc and SunStudio), all result in crashes. Adding: keepalive_timeout 0; ...to nginx.conf appears to reduce the crashes. It happens with a variety of users and with debug output I see no pattern.
James.
Hi!
There is an easy fix for this, attached.
Aki
On 06/08/2019 06:46, Aki Tuomi via dovecot wrote:
On 2.8.2019 13.45, James via dovecot wrote:
My auth process is dumping core. This happens several times per day
...
There is an easy fix for this, attached.
Patch applied; no core dump in 24 hours.
This appears to have fixed the problem. I found that it crashed when the policy server responded too quickly. As the before and after auth command=allow request are the same I cache the first, leading to a fast second response. Removing the cache (nginx proxy_cache ...) must change the timings and circumvented the crash. Why use both check before and after auth? roundcube webmail reports an error with only auth_policy_check_before_auth. I cannot see why. The simple and lazy solution is to use double auth_policy_check_!
Thank you Aki for looking at this and finding a solution so quickly.
On 7.8.2019 11.51, James via dovecot wrote:
On 06/08/2019 06:46, Aki Tuomi via dovecot wrote:
On 2.8.2019 13.45, James via dovecot wrote:
My auth process is dumping core. This happens several times per day
...
There is an easy fix for this, attached.
Patch applied; no core dump in 24 hours.
This appears to have fixed the problem. I found that it crashed when the policy server responded too quickly. As the before and after auth command=allow request are the same I cache the first, leading to a fast second response. Removing the cache (nginx proxy_cache ...) must change the timings and circumvented the crash. Why use both check before and after auth? roundcube webmail reports an error with only auth_policy_check_before_auth. I cannot see why. The simple and lazy solution is to use double auth_policy_check_!
Thank you Aki for looking at this and finding a solution so quickly.
The double-check is for places which want to implement something like COS or want to perform validations in policy server *after* we know the user identity. The first check is done before we even know if the user or the credential(s) are valid.
Aki
On 07/08/2019 11:02, Aki Tuomi via dovecot wrote:
before and after auth? roundcube webmail reports an error with only auth_policy_check_before_auth. I cannot see why. The simple and lazy solution is to use double auth_policy_check_! ...
The double-check is for places which want to implement something like COS or want to perform validations in policy server *after* we know the user identity. The first check is done before we even know if the user or the credential(s) are valid.
I can see why both before and after are options. My more simplistic policy does not need both. I perform whitelist, blacklist, geo and greylist and do not cross reference these with the user. I can't see why roundcubemail fails without both. The IMAP exchange with roundcubemail should not be aware of the policy server. I was spending [wasting] too much time on looking for an answer and gave up.
On 07/08/2019 11:19, James via dovecot wrote:
My more simplistic policy does not need both. I perform whitelist, blacklist, geo and greylist
...and DNSBL which where I started with the policyserver, "Can dovecot do DNSBL?", only indirectly via a policyserver. This is better as most pass white list or fail geo local checks before doing the external DNS lookup.
participants (2)
-
Aki Tuomi
-
James