On Tue, 2007-12-11 at 00:14 +0800, Joe Wong wrote:
I've found that the problem may not be related to the non english content. I just tried delete dovecot.index.* from the not-working folder. After that, full text search is also working on that folder, no more Corrupted squat uidlist file error. Why are the two related to each other?
I don't know really .. They shouldn't have anything to do with each others. Although if you also deleted dovecot-uidlist, Dovecot assigns new UIDs to messages and that might have helped. But if dovecot-uidlist was there, then I've no idea. If you can reproduce this somehow I'd like to know.
Can the system auto-heal under such condition?
Dovecot should always auto-heal itself, so it's a bug if it doesn't.
By the way, for chinese, each BIG5 charcter is two bytes long and it is 3 bytes in UTF-8 encoding. For Chinese, a search word can contain 1 "character" or more. I think the indexer should convert the text to UTF-8 and cut the word to UTF-8 character but not bytes boundary.
That's how it works currently. With the byte count I meant that it would cut at the previous (or the next) character after that many bytes. So for example "abcd" and "åäöå" would be indexed as 4 characters, because the first takes 4 bytes and the second takes 8 bytes, but then 4 chinese characters each taking 3 bytes would be cut after 2 or 3 characters.
None of this affects the actual search results. Only how much disk space, memory and disk I/O is used when searching.