Incorrect value sent to solr
Hello,
I've recently stumbled upon an error in fts-solr plugin. Dovecot sends "rows" parameter to solr when someone is searching in mailbox, but value of rows can be bigger than int in java. If I understand correctly, rows is equal to the last message id on current imap folder. I have mailbox where last uid value is 2206267083 and when I tried to search in this mailbox I got error 400 - Bad Request.
On a Solr side I got an error: java.lang.NumberFormatException: For input string: "2206267083" NumberFormat expect an int which max value is 2147483647.
Quick test reviles that rows parameter isn't required for correct response from solr. I think that dovecot should check if the value of rows is a correct int, and if not, just skip this argument.
Dovecot version: 2.3.11.3 (502c39af9) Solr version: 8.7.0
Łukasz Szczepański +48 58 3509284 www.webd.pl [1] Globtel Internet
Links:
On 4/2/2021 7:11 AM, Łukasz Szczepański wrote:
On a Solr side I got an error: java.lang.NumberFormatException: For input string: "2206267083" NumberFormat expect an int which max value is 2147483647.
Quick test reviles that rows parameter isn't required for correct response from solr. I think that dovecot should check if the value of rows is a correct int, and if not, just skip this argument.
That error comes from Solr. You can fix it on the Solr side by changing the type on the field from int to long in the schema.
When I look at the provided example schema for fts_solr, I do not even see a definition for an "int" type, so if you have one, it's wrong.
https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0....
Thanks, Shawn
I already have uid as long in schema: <field name="id" type="string" indexed="true" required="true" stored="true"/> <field name="uid" type="long" indexed="true" required="true" stored="true"/> <field name="box" type="string" indexed="true" required="true" stored="true"/> <field name="user" type="string" indexed="true" required="true" stored="true"/>
<field name="hdr" type="text" indexed="true" stored="false"/> <field name="body" type="text" indexed="true" stored="false"/>
<field name="from" type="text" indexed="true" stored="false"/> <field name="to" type="text" indexed="true" stored="false"/> <field name="cc" type="text" indexed="true" stored="false"/> <field name="bcc" type="text" indexed="true" stored="false"/> <field name="subject" type="text" indexed="true" stored="false"/>
Rows isn't part of manage-schema for solr collection its build in common query parameters in solr, and type of this field cannot be changed. It controls the number of returned results (by default 10). I still think that an issue on dovecot side, but I was also wrong, this parameter is important and have to be sent. Dovecot should send a count of messages in mailbox (or imap folder) not the highest uid in folder.
W dniu 2021-04-03 15:48, Shawn Heisey napisał(a):
On 4/2/2021 7:11 AM, Łukasz Szczepański wrote:
On a Solr side I got an error: java.lang.NumberFormatException: For input string: "2206267083" NumberFormat expect an int which max value is 2147483647.
Quick test reviles that rows parameter isn't required for correct response from solr. I think that dovecot should check if the value of rows is a correct int, and if not, just skip this argument.
That error comes from Solr. You can fix it on the Solr side by changing the type on the field from int to long in the schema.
When I look at the provided example schema for fts_solr, I do not even see a definition for an "int" type, so if you have one, it's wrong.
https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0....
Thanks, Shawn
On 4/3/2021 12:12 PM, Łukasz Szczepański wrote:
Rows isn't part of manage-schema for solr collection its build in common query parameters in solr, and type of this field cannot be changed. It controls the number of returned results (by default 10). I still think that an issue on dovecot side, but I was also wrong, this parameter is important and have to be sent. Dovecot should send a count of messages in mailbox (or imap folder) not the highest uid in folder.
I just tried a query on a stock Solr example with rows set to 3000000000 and got an error similar to the one you reported.
org.apache.solr.common.SolrException: For input string: "3000000000"
No mention of "int" in this error. I'm wondering if that part was something you added to the email, not text that was actually in the error.
I was using Solr 8.5.1, the newest version I have downloaded right now.
This error also occurs in cloud mode. With distributed indexes, it could be perfectly acceptable to use a rows parameter that exceeds what a signed int can hold. Performance would probably suck when using a value that high, but it would be acceptable. Which smells like a bug in Solr.
I can tell you that using a value from uid in the rows parameter is almost certainly not the right thing to do. That field would have no connection to rows.
Thanks, Shawn
Error java.lang.NumberFormatException: For input string: "2206267083" I got from logs section in solr, it's visible when you expand an error. NumberFormatException is raised when passed value isn't correct number value. Quick tests shows that in this place solr expects signed int.
I know that using uid is not the correct way, but Dovecot seems to be doing just that: https://github.com/dovecot/core/blob/57069b23e6515875796473bdd4ec4bf90343fb2...
W dniu 2021-04-03 23:15, Shawn Heisey napisał(a):
On 4/3/2021 12:12 PM, Łukasz Szczepański wrote:
Rows isn't part of manage-schema for solr collection its build in common query parameters in solr, and type of this field cannot be changed. It controls the number of returned results (by default 10). I still think that an issue on dovecot side, but I was also wrong, this parameter is important and have to be sent. Dovecot should send a count of messages in mailbox (or imap folder) not the highest uid in folder.
I just tried a query on a stock Solr example with rows set to 3000000000 and got an error similar to the one you reported.
org.apache.solr.common.SolrException: For input string: "3000000000"
No mention of "int" in this error. I'm wondering if that part was something you added to the email, not text that was actually in the error.
I was using Solr 8.5.1, the newest version I have downloaded right now.
This error also occurs in cloud mode. With distributed indexes, it could be perfectly acceptable to use a rows parameter that exceeds what a signed int can hold. Performance would probably suck when using a value that high, but it would be acceptable. Which smells like a bug in Solr.
I can tell you that using a value from uid in the rows parameter is almost certainly not the right thing to do. That field would have no connection to rows.
Thanks, Shawn
I've prepared pull request with fixed rows parameter: https://github.com/dovecot/core/pull/160 I've tested this fix on mailbox which caused a problem earlier, and its works fine.
W dniu 2021-04-03 23:53, Łukasz Szczepański napisał(a):
Error java.lang.NumberFormatException: For input string: "2206267083" I got from logs section in solr, it's visible when you expand an error. NumberFormatException is raised when passed value isn't correct number value. Quick tests shows that in this place solr expects signed int.
I know that using uid is not the correct way, but Dovecot seems to be doing just that: https://github.com/dovecot/core/blob/57069b23e6515875796473bdd4ec4bf90343fb2...
W dniu 2021-04-03 23:15, Shawn Heisey napisał(a):
On 4/3/2021 12:12 PM, Łukasz Szczepański wrote:
Rows isn't part of manage-schema for solr collection its build in common query parameters in solr, and type of this field cannot be changed. It controls the number of returned results (by default 10). I still think that an issue on dovecot side, but I was also wrong, this parameter is important and have to be sent. Dovecot should send a count of messages in mailbox (or imap folder) not the highest uid in folder.
I just tried a query on a stock Solr example with rows set to 3000000000 and got an error similar to the one you reported.
org.apache.solr.common.SolrException: For input string: "3000000000"
No mention of "int" in this error. I'm wondering if that part was something you added to the email, not text that was actually in the error.
I was using Solr 8.5.1, the newest version I have downloaded right now.
This error also occurs in cloud mode. With distributed indexes, it could be perfectly acceptable to use a rows parameter that exceeds what a signed int can hold. Performance would probably suck when using a value that high, but it would be acceptable. Which smells like a bug in Solr.
I can tell you that using a value from uid in the rows parameter is almost certainly not the right thing to do. That field would have no connection to rows.
Thanks, Shawn
Łukasz Szczepański +48 58 3509284 www.webd.pl Globtel Internet
On 4/7/2021 4:13 AM, Łukasz Szczepański wrote:
I've prepared pull request with fixed rows parameter: https://github.com/dovecot/core/pull/160 I've tested this fix on mailbox which caused a problem earlier, and its works fine.
I admit to not being very familiar with dovecot source, but I did a little digging and that fix looks good to my untrained eye.
If somebody has 2.2 billion or more messages in one place, the query will still fail, but it should be pretty rare to ever run into that use case in the wild. And I think it might cause performance problems in dovecot too.
Does the solr backend code ever use pagination on the search results by sending a fixed rows value and increasing start value on subsequent queries?
I think that I and my fellow Solr committers need to treat this as a bug on our end, since it would be perfectly valid to request that many rows on a distributed index. A query like that where there really are that many rows would be embarrassingly slow to execute and require a LOT of memory, but it's still valid. I am asking on the solr users list whether they think I should file a bug report.
Thanks, Shawn
I'm not as familiar with C, but I don't see in solr backed in dovecot any clue of subsequent queries for single mailbox lookup (which most mail client uses). There is a hard limit of 100000 rows for multiple mailbox lookup.
W dniu 2021-04-07 12:48, Shawn Heisey napisał(a):
On 4/7/2021 4:13 AM, Łukasz Szczepański wrote:
I've prepared pull request with fixed rows parameter: https://github.com/dovecot/core/pull/160 I've tested this fix on mailbox which caused a problem earlier, and its works fine.
I admit to not being very familiar with dovecot source, but I did a little digging and that fix looks good to my untrained eye.
If somebody has 2.2 billion or more messages in one place, the query will still fail, but it should be pretty rare to ever run into that use case in the wild. And I think it might cause performance problems in dovecot too.
Does the solr backend code ever use pagination on the search results by sending a fixed rows value and increasing start value on subsequent queries?
I think that I and my fellow Solr committers need to treat this as a bug on our end, since it would be perfectly valid to request that many rows on a distributed index. A query like that where there really are that many rows would be embarrassingly slow to execute and require a LOT of memory, but it's still valid. I am asking on the solr users list whether they think I should file a bug report.
Thanks, Shawn
Łukasz Szczepański +48 58 3509284 www.webd.pl Globtel Internet
On 07/04/2021 13:36, Łukasz Szczepański wrote:
I'm not as familiar with C, but I don't see in solr backed in dovecot any clue of subsequent queries for single mailbox lookup (which most mail client uses). There is a hard limit of 100000 rows for multiple mailbox lookup.
This was reported back in December 2020. I submitted this fix to the list on 31/12/2020 to give an upper bound for single mailbox queries as there is in place for multiple mailbox queries. diff -ur dovecot-2.3.11.3-orig/src/plugins/fts-solr/fts-backend-solr.c dovecot-2.3.11.3/src/plugins/fts-solr/fts-backend-solr.c --- dovecot-2.3.11.3-orig/src/plugins/fts-solr/fts-backend-solr.c 2020-08-12 14:20:41.000000000 +0200 +++ dovecot-2.3.11.3/src/plugins/fts-solr/fts-backend-solr.c 2020-12-31 09:05:07.681897716 +0100 @@ -838,7 +838,7 @@ str = t_str_new(256); str_printfa(str, "wt=xml&fl=uid,score&rows=%u&sort=uid+asc&q=%%7b!lucene+q.op%%3dAND%%7d", - status.uidnext); + I_MIN(status.uidnext,SOLR_MAX_MULTI_ROWS)); prefix_len = str_len(str); if (solr_add_definite_query_args(str, args, and_args)) {
On 07/04/2021 22:58 John Fawcett
wrote: On 07/04/2021 13:36, Łukasz Szczepański wrote:
I'm not as familiar with C, but I don't see in solr backed in dovecot any clue of subsequent queries for single mailbox lookup (which most mail client uses). There is a hard limit of 100000 rows for multiple mailbox lookup.
This was reported back in December 2020. I submitted this fix to the list on 31/12/2020 to give an upper bound for single mailbox queries as there is in place for multiple mailbox queries.
diff -ur dovecot-2.3.11.3-orig/src/plugins/fts-solr/fts-backend-solr.c dovecot-2.3.11.3/src/plugins/fts-solr/fts-backend-solr.c
--- dovecot-2.3.11.3-orig/src/plugins/fts-solr/fts-backend-solr.c 2020-08-12 14:20:41.000000000 +0200 +++ dovecot-2.3.11.3/src/plugins/fts-solr/fts-backend-solr.c 2020-12-31 09:05:07.681897716 +0100 @@ -838,7 +838,7 @@
str = t_str_new(256); str_printfa(str, "wt=xml&fl=uid,score&rows=%u&sort=uid+asc&q=%%7b!lucene+q.op%%3dAND%%7d", - status.uidnext); + I_MIN(status.uidnext,SOLR_MAX_MULTI_ROWS)); prefix_len = str_len(str);
if (solr_add_definite_query_args(str, args, and_args)) {
Hi! Thanks for reminding us, I'll make a ticket about this to avoid forgetting it again. Aki
On 08/04/2021 08:02, Aki Tuomi wrote:
Hi! Thanks for reminding us, I'll make a ticket about this to avoid forgetting it again.
Aki
Thanks to you and the team for taking it into consideration!
PS is this list the best way to post patch proposals or is it easier for you to get code from pull requests?
John
participants (4)
-
Aki Tuomi
-
John Fawcett
-
Shawn Heisey
-
Łukasz Szczepański