Indexer error after upgrade to 2.3.11.3 [trial patch]

Scott Q. qmail at top-consulting.net
Sat Oct 31 01:18:44 EET 2020


I have implemented this change and the core dumps went away.

It appears so far that it's indeed the right fix.


Has this change been introduced in 2.3.11.3 ? 


Thanks!

On Tuesday, 27/10/2020 at 08:20 John Fawcett wrote:


 On 22/10/2020 10:23, John Fawcett wrote:



 On 21/10/2020 19:00, John Fawcett wrote:



 On 21/10/2020 16:44, Patrik Peng wrote:



 On 16.10.20 18:34, Patrik Peng wrote:



 On 16.10.20 18:00, Scott Q. wrote:


 This reminds me, the way I was able to reproduce this consistently
was by having large headers ( 100+ lines ). 


On Friday, 16/10/2020 at 11:49 Patrik Peng wrote:


 

On 19.08.20 17:37, Josef 'Jeff' Sipek wrote:




 

On Wed, Aug 19, 2020 at 17:03:57 +0200, Alessio Cecchi wrote:  

 Hi, after the upgrade to Dovecot 2.3.11.3, from 2.3.10.1, I see
frequently these errors from different users:  

 It looks like this has been around for a while and you just got
unlucky and started seeing this now. Here's a quick & dirty patch that
should fix this. If you can try it, let us know how it went. Jeff.
diff --git a/src/plugins/fts-solr/solr-connection.c
b/src/plugins/fts-solr/solr-connection.c index
ae720b5e2870a852c1b6c440939e3c7c0fa72b5c..9d364f93e2cd1b716b9ab61bd39656a6c5b1ea04
100644 --- a/src/plugins/fts-solr/solr-connection.c +++
b/src/plugins/fts-solr/solr-connection.c @@ -103,7 +103,7 @@ int
solr_connection_init(const struct fts_solr_settings *solr_set,
http_set.ssl = ssl_client_set; http_set.debug = solr_set->debug;
http_set.rawlog_dir = solr_set->rawlog_dir; -solr_http_client =
http_client_init(&http_set); +solr_http_client =
http_client_init_private(&http_set); } *conn_r = conn; diff --git
a/src/plugins/fts/fts-parser-tika.c
b/src/plugins/fts/fts-parser-tika.c index
a4b8b5c3034f57e22e77caa759c090da6b62f8ba..b8b57a350b9a710d101ac7ccbcc14560d415d905
100644 --- a/src/plugins/fts/fts-parser-tika.c +++
b/src/plugins/fts/fts-parser-tika.c @@ -77,7 +77,7 @@
tika_get_http_client_url(struct mail_user *user, struct http_url
**http_url_r) http_set.request_timeout_msecs = 60*1000; http_set.ssl =
&ssl_set; http_set.debug = user->mail_debug; -tika_http_client =
http_client_init(&http_set); +tika_http_client =
http_client_init_private(&http_set); } *http_url_r = tuser->http_url;
return 0;  

  Greetings



I'm also experiencing these issues while running Dovecot 2.3.11.3 with
Solr 8.6.3 on FreeBSD 11.4. As mentioned in a previous mail, the above
patch is already applied to Dovecot's FreeBSD Port, confirmed by the
patches being present in the portstree
(https://svnweb.freebsd.org/ports/branches/2020Q3/mail/dovecot/files/).



In a FreeBSD VM with the official image
(https://download.freebsd.org/ftp/releases/VM-IMAGES/12.1-RELEASE/amd64/Latest/)
I compiled dovecot from git and was able to reproduce the error with
the patch mentioned above applied and also without any patches at all.
>From these results i conclude, that neither the patches applied in
FreeBSDs portstree or the patch above have any influence.



I also managed to reproduce the same results on a Debian 10 machine
(also with and without the patch):



doveadm(some.user at example.com [1]): Panic: file http-client-request.c:
line 1232 (http_client_request_send_more): assertion failed:
(req->payload_input != NULL) doveadm(some.user at example.com [1]):
Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so
[2].0(backtrace_append+0x42) [0x7f093f7fc3c2] ->
/usr/local/lib/dovecot/libdovecot.so [2].0(backtrace_get+0x1e)
[0x7f093f7fc4ce] -> /usr/local/lib/dovecot/libdovecot.so
[2].0(+0xea341) [0x7f093f807341] ->
/usr/local/lib/dovecot/libdovecot.so [2].0(+0xea381) [0x7f093f807381]
-> /usr/local/lib/dovecot/libdovecot.so [2].0(i_fatal+0)
[0x7f093f75c074] -> /usr/local/lib/dovecot/libdovecot.so
[2].0(http_client_request_send_more+0x378) [0x7f093f7a47a8] ->
/usr/local/lib/dovecot/libdovecot.so
[2].0(http_client_connection_output+0xe4) [0x7f093f7a90f4] ->
/usr/local/lib/dovecot/libssl_iostream_openssl.so [3](+0x8bff)
[0x7f093ec71bff] -> /usr/local/lib/dovecot/libdovecot.so
[2].0(+0x1148b0) [0x7f093f8318b0] ->
/usr/local/lib/dovecot/libdovecot.so [2].0(io_loop_call_io+0x69)
[0x7f093f820259] -> /usr/local/lib/dovecot/libdovecot.so
[2].0(io_loop_handler_run_internal+0x11b) [0x7f093f821b6b] ->
/usr/local/lib/dovecot/libdovecot.so [2].0(io_loop_handler_run+0x59)
[0x7f093f820369] -> /usr/local/lib/dovecot/libdovecot.so
[2].0(io_loop_run+0x38) [0x7f093f820598] ->
/usr/local/lib/dovecot/libdovecot.so [2].0(+0x86d1e) [0x7f093f7a3d1e]
-> /usr/local/lib/dovecot/libdovecot.so
[2].0(http_client_request_finish_payload+0x2e) [0x7f093f7a407e] ->
/usr/local/lib/dovecot/lib21_fts_solr_plugin.so
[4](solr_connection_post_end+0x32) [0x7f093b8492c2] ->
/usr/local/lib/dovecot/lib21_fts_solr_plugin.so [4](+0x3a45)
[0x7f093b844a45] -> /usr/local/lib/dovecot/lib20_fts_plugin.so
[4](+0x94cc) [0x7f093e1104cc] ->
/usr/local/lib/dovecot/lib20_fts_plugin.so
[4](fts_backend_update_deinit+0x23) [0x7f093e110503] ->
/usr/local/lib/dovecot/lib20_fts_plugin.so [4](+0x10a9b)
[0x7f093e117a9b] -> /usr/local/lib/dovecot/lib20_fts_plugin.so
[4](+0x119ca) [0x7f093e1189ca] ->
/usr/local/lib/dovecot/libdovecot-storage.so
[5].0(mailbox_transaction_commit_get_changes+0x56) [0x7f093fb16076] ->
/usr/local/lib/dovecot/libdovecot-storage.so
[5].0(mailbox_transaction_commit+0x1e) [0x7f093fb1615e] ->
doveadm(+0x31370) [0x5607cfa1f370] -> doveadm(+0x2b2a8)
[0x5607cfa192a8] -> doveadm(+0x2bfb2) [0x5607cfa19fb2] ->
doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x215) [0x5607cfa1ae05]
-> doveadm(doveadm_cmd_run_ver2+0x57c) [0x5607cfa2bbec] ->
doveadm(doveadm_cmd_try_run_ver2+0x37) [0x5607cfa2bc37] ->
doveadm(main+0x1d2) [0x5607cfa09492] Aborted  During my tests I also
did notice, that the error appears more often depending of mail size
and amount of mails in a folder:




Tested with: doveadm -v fts rescan -u some.user at example.com && doveadm
-v index -u some.user at example.com '*' 1 Mail in INBOX with 9KB ->
Error appeared 0 out of 20 times 1 Mail in INBOX with 136KB -> Error
appeared 17 out of 20 times 3 Mails in INBOX with 408KB -> Error
appeared 12 out of 20 times 20 Mails in INBOX with ~2MB -> Error
appeared 0 out of 20 times  Maybe this info helps anyone.

Patrik




 


 

Yeah, I read your mail and that's why I tested with different mail
sizes.
I did some more tests, one with large headers (around 700 lines of a
long header line) but small body:




1 Mail in INBOX with 583KB -> Error appeared 5 out of 20 times   and
another with a large mail body but normal headers:



1 Mail in INBOX with 585KB -> Error appeared 0 out of 20 times  I
guess this kinda confirms your guess, but interestingly shows less
errors than my previous test with a large header 136KB Mail.

Patrik




 

I did notice that changing the batch_size in




fts_solr = url=http://solr.example.org:8983/solr/ soft_commit=yes
batch_size=1000  does have an influence in how often the error occurs.
Setting it to 1 or some huge number like 10000 reduces the chances
quite a bit but not completely and also causes lots of small , or few
but quite large requests to Solr, so its not a practical workaround.

Whats the current state of this bug? A fix for it would be very
welcome, as it causes some trouble in our setup.


Regards




 

Hi



would someone care to post (preferably a link) to a test email with
which the problem can be reproduced?



thanks



Joh




 

I can confirm that this is an error that was introduced in 2.3.11.3.
It did not happen on 2.3.10. I see it occurring on a Centos 7
installation compiled from source. Between these two releases I didn't
see changes on the fts-solr code, but I do see some changes on the
lib-http code. I suspect that the changes done on lib-http have
introduced this issue. 




One temporary solution might be to roll back the relevant commits that
impacted lib-http and see if the problem resolves. Given the info in
this thread about the records arriving on solr, I'm wondering if this
is just a problem where the data has been split into more than one
request, but the last one has no data left in it.




John




 

Hi



I did a bit of tracing. From the albeit limited examples that I saw
the panic and assert segfault always happens when 
req->payload_finished is true. If that observation is correct in
general, then the patch is simple, just put the test for
payload_finished before the assert for payload_input and
payload_output. Would anyone with the problem care to try it and give
feedback?




diff -ur dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c
dovecot-2.3.11.3/src/lib-http/http-client-request.c
--- dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c   
2020-08-12 14:20:41.000000000 +0200
+++ dovecot-2.3.11.3/src/lib-http/http-client-request.c 2020-10-27
13:06:09.352973130 +0100
@@ -1229,12 +1229,12 @@
        const char *error;
        uoff_t offset;

-       i_assert(req->payload_input != NULL);
-       i_assert(req->payload_output != NULL);
-
        if (req->payload_finished)
                return
http_client_request_finish_payload_out(req);

+       i_assert(req->payload_input != NULL);
+       i_assert(req->payload_output != NULL);
+
        io_remove(&conn->io_req_payload);

        /* chunked ostream needs to write to the parent
stream's buffer */








Links:
------
[1] mailto:some.user2 at example.com
[2] http://libdovecot.so
[3] http://openssl.so
[4] http://plugin.so
[5] http://libdovecot-storage.so
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20201030/5591abb5/attachment-0001.html>


More information about the dovecot mailing list