dovecot-2.2: lib-fts: Fix tr29 tokenizer apostrophe handling.

dovecot at dovecot.org dovecot at dovecot.org
Thu May 21 10:39:12 UTC 2015


details:   http://hg.dovecot.org/dovecot-2.2/rev/5ca59cffbf2f
changeset: 18731:5ca59cffbf2f
user:      Teemu Huovila <teemu.huovila at dovecot.fi>
date:      Thu May 21 06:17:32 2015 -0400
description:
lib-fts: Fix tr29 tokenizer apostrophe handling.
U+0027, which is called Single Quote in tr29, was not properly
handled as a word boundary.

diffstat:

 src/lib-fts/fts-tokenizer-generic.c |  9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diffs (26 lines):

diff -r 45013c8cf69c -r 5ca59cffbf2f src/lib-fts/fts-tokenizer-generic.c
--- a/src/lib-fts/fts-tokenizer-generic.c	Mon May 18 14:53:52 2015 +0300
+++ b/src/lib-fts/fts-tokenizer-generic.c	Thu May 21 06:17:32 2015 -0400
@@ -464,8 +464,8 @@
 
 	if (lt == LETTER_TYPE_REGIONAL_INDICATOR || lt == LETTER_TYPE_KATAKANA ||
 	    lt == LETTER_TYPE_HEBREW_LETTER || lt == LETTER_TYPE_ALETTER ||
-	    lt == LETTER_TYPE_SINGLE_QUOTE || lt == LETTER_TYPE_NUMERIC)
-		return FALSE; /* TODO: Include LETTER_TYPE_DOUBLE_QUOTE? */
+	    lt == LETTER_TYPE_NUMERIC)
+		return FALSE;
 
 	return TRUE;
 }
@@ -535,8 +535,9 @@
   http://www.unicode.org/reports/tr29/
 
   Adaptions: No word boundary at Start-Of-Text or End-of-Text (Wb1 and
-  WB2). Break just once, not before and after.  Other things also, not
-  really pure tr29. Meant to assist in finding individual words.
+  WB2). Break just once, not before and after.  Other things also
+  (e.g. is_nonword(), not really pure tr29. Meant to assist in finding
+  individual words.
 
   TODO: If this letter_fns based approach is too kludgy, do a FSM with function
   pointers and transition tables.


More information about the dovecot-cvs mailing list