dovecot-2.2: lib-fts: Fix tr29 tokenizer apostrophe handling.
dovecot at dovecot.org
dovecot at dovecot.org
Thu May 21 10:39:12 UTC 2015
details: http://hg.dovecot.org/dovecot-2.2/rev/5ca59cffbf2f
changeset: 18731:5ca59cffbf2f
user: Teemu Huovila <teemu.huovila at dovecot.fi>
date: Thu May 21 06:17:32 2015 -0400
description:
lib-fts: Fix tr29 tokenizer apostrophe handling.
U+0027, which is called Single Quote in tr29, was not properly
handled as a word boundary.
diffstat:
src/lib-fts/fts-tokenizer-generic.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diffs (26 lines):
diff -r 45013c8cf69c -r 5ca59cffbf2f src/lib-fts/fts-tokenizer-generic.c
--- a/src/lib-fts/fts-tokenizer-generic.c Mon May 18 14:53:52 2015 +0300
+++ b/src/lib-fts/fts-tokenizer-generic.c Thu May 21 06:17:32 2015 -0400
@@ -464,8 +464,8 @@
if (lt == LETTER_TYPE_REGIONAL_INDICATOR || lt == LETTER_TYPE_KATAKANA ||
lt == LETTER_TYPE_HEBREW_LETTER || lt == LETTER_TYPE_ALETTER ||
- lt == LETTER_TYPE_SINGLE_QUOTE || lt == LETTER_TYPE_NUMERIC)
- return FALSE; /* TODO: Include LETTER_TYPE_DOUBLE_QUOTE? */
+ lt == LETTER_TYPE_NUMERIC)
+ return FALSE;
return TRUE;
}
@@ -535,8 +535,9 @@
http://www.unicode.org/reports/tr29/
Adaptions: No word boundary at Start-Of-Text or End-of-Text (Wb1 and
- WB2). Break just once, not before and after. Other things also, not
- really pure tr29. Meant to assist in finding individual words.
+ WB2). Break just once, not before and after. Other things also
+ (e.g. is_nonword(), not really pure tr29. Meant to assist in finding
+ individual words.
TODO: If this letter_fns based approach is too kludgy, do a FSM with function
pointers and transition tables.
More information about the dovecot-cvs
mailing list