[Dovecot] squat also not working in beta11

Timo Sirainen tss at iki.fi
Mon Dec 10 17:42:04 EET 2007


On Mon, 2007-12-10 at 23:29 +0800, Joe Wong wrote:
> Hi Timo,
> 
> Just take your suggestion. I have another collections of emails and running 
> full text search on that did not encounter any problem no matter they are on 
> NFS or local disk.
> 
> You mentioned that full text search is only working on for english only 
> mailbox, what is the current limitation of it? Is there any plan to support 
> non-english email ( conversion to UTF8? )

It should work with any UTF8 input, and I've tested that it works with
some mails containing non-ASCII characters. There's nothing in design
that prevents it. But I guess there is some bug then that causes these
problems. If you could send me a test mailbox where this happens I could
take a look at fixing it.

Although now that you mentioned it, I wonder if the current design could
be optimized to work a bit differently with Chinese/Japanese/etc.
Currently it works by indexing 4 character blocks, so with non-ASCII
UTF-8 input it may end up indexing more than 4 bytes per block. How many
bytes does a typical chinese UTF-8 character take? How many characters
does a typical chinese word take? How many characters are in your
typical search word?

I was just wondering if there's a lot of 1-3 character words, maybe the
indexing could limit itself to something like minimum of(4 characters,
~8 bytes). That would then take less space and memory.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://dovecot.org/pipermail/dovecot/attachments/20071210/bde51be6/attachment.bin 


More information about the dovecot mailing list