[Dovecot] squat also not working in beta11
Hi Timo,
Just tried squat in beta11. I also get the same error in the log:
dovecot: Dec 10 14:21:17 Error: IMAP(joewong99:joew.outblaze.com): Corrupted squat uidlist file /mailfs/4/22/3/joewon g99:joew.outblaze.com@joew_outblaze_com/Maildir/dovecot.index.search.uids: Broken uidlists
There is not change in my configuration file.
# 1.1.beta11: /usr/local/etc/dovecot.conf log_path: /var/log/dovecot-1.1.log info_log_path: /var/log/dovecot-1.1.log protocols: imap ssl_disable: yes login_dir: /usr/local/var/run/dovecot/login login_executable: /usr/local/libexec/dovecot/imap-login login_user: nobody verbose_proctitle: yes mail_location: maildir:~/Maildir mail_debug: yes mmap_disable: yes mail_nfs_storage: yes mail_nfs_index: yes mail_drop_priv_before_exec: yes mail_plugins: fts fts_squat auth default: user: mdrop username_chars: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890.-_@: username_translation: @: verbose: yes debug: yes debug_passwords: yes passdb: driver: sql args: /usr/local/etc/dovecot-sql.conf userdb: driver: sql args: /usr/local/etc/dovecot-sql.conf plugin: fts: squat
--
On Mon, 2007-12-10 at 18:48 +0800, Joe Wong wrote:
Hi Timo,
Just tried squat in beta11. I also get the same error in the log:
dovecot: Dec 10 14:21:17 Error: IMAP(joewong99:joew.outblaze.com): Corrupted squat uidlist file /mailfs/4/22/3/joewon g99:joew.outblaze.com@joew_outblaze_com/Maildir/dovecot.index.search.uids: Broken uidlists
This happens immediately when searching? Wonder if the problem has to do with emails inside the maildir, because I can't break it even with stress testing.
Have you tried with english-only mailboxes? Or having the mailbox located on local disk? It's pretty easy to test by running manually:
FTS=squat MAIL=fts,fts_squat MAIL=~/Maildir /usr/local/libexec/imap 1 select inbox 2 search text "hello"
Hi Timo,
Just take your suggestion. I have another collections of emails and running full text search on that did not encounter any problem no matter they are on NFS or local disk.
You mentioned that full text search is only working on for english only mailbox, what is the current limitation of it? Is there any plan to support non-english email ( conversion to UTF8? )
Thanks,
- Joe
----- Original Message ----- From: "Timo Sirainen" tss@iki.fi To: "Joe Wong" joewong@tkodog.no-ip.com Cc: dovecot@dovecot.org Sent: Monday, December 10, 2007 7:15 PM Subject: Re: [Dovecot] squat also not working in beta11
On Mon, 2007-12-10 at 23:29 +0800, Joe Wong wrote:
Hi Timo,
Just take your suggestion. I have another collections of emails and running full text search on that did not encounter any problem no matter they are on NFS or local disk.
You mentioned that full text search is only working on for english only mailbox, what is the current limitation of it? Is there any plan to support non-english email ( conversion to UTF8? )
It should work with any UTF8 input, and I've tested that it works with some mails containing non-ASCII characters. There's nothing in design that prevents it. But I guess there is some bug then that causes these problems. If you could send me a test mailbox where this happens I could take a look at fixing it.
Although now that you mentioned it, I wonder if the current design could be optimized to work a bit differently with Chinese/Japanese/etc. Currently it works by indexing 4 character blocks, so with non-ASCII UTF-8 input it may end up indexing more than 4 bytes per block. How many bytes does a typical chinese UTF-8 character take? How many characters does a typical chinese word take? How many characters are in your typical search word?
I was just wondering if there's a lot of 1-3 character words, maybe the indexing could limit itself to something like minimum of(4 characters, ~8 bytes). That would then take less space and memory.
Hi Timo,
I've found that the problem may not be related to the non english content. I just tried delete dovecot.index.* from the not-working folder. After that, full text search is also working on that folder, no more Corrupted squat uidlist file error. Why are the two related to each other? Can the system auto-heal under such condition?
By the way, for chinese, each BIG5 charcter is two bytes long and it is 3 bytes in UTF-8 encoding. For Chinese, a search word can contain 1 "character" or more. I think the indexer should convert the text to UTF-8 and cut the word to UTF-8 character but not bytes boundary.
What do you think?
- Joe
----- Original Message ----- From: "Timo Sirainen" tss@iki.fi To: "Joe Wong" joewong@tkodog.no-ip.com Cc: "Dovecot Mailing List" dovecot@dovecot.org Sent: Monday, December 10, 2007 11:42 PM Subject: Re: [Dovecot] squat also not working in beta11
On Tue, 2007-12-11 at 00:14 +0800, Joe Wong wrote:
I've found that the problem may not be related to the non english content. I just tried delete dovecot.index.* from the not-working folder. After that, full text search is also working on that folder, no more Corrupted squat uidlist file error. Why are the two related to each other?
I don't know really .. They shouldn't have anything to do with each others. Although if you also deleted dovecot-uidlist, Dovecot assigns new UIDs to messages and that might have helped. But if dovecot-uidlist was there, then I've no idea. If you can reproduce this somehow I'd like to know.
Can the system auto-heal under such condition?
Dovecot should always auto-heal itself, so it's a bug if it doesn't.
By the way, for chinese, each BIG5 charcter is two bytes long and it is 3 bytes in UTF-8 encoding. For Chinese, a search word can contain 1 "character" or more. I think the indexer should convert the text to UTF-8 and cut the word to UTF-8 character but not bytes boundary.
That's how it works currently. With the byte count I meant that it would cut at the previous (or the next) character after that many bytes. So for example "abcd" and "åäöå" would be indexed as 4 characters, because the first takes 4 bytes and the second takes 8 bytes, but then 4 chinese characters each taking 3 bytes would be cut after 2 or 3 characters.
None of this affects the actual search results. Only how much disk space, memory and disk I/O is used when searching.
participants (2)
-
Joe Wong
-
Timo Sirainen