Hi,
I tried the FTS (and FTS Squat) plugin today, and it works as advertised.
But: On my 13000 folders with 160000 mails maildir I use for testing, the speed increase is not as big as one would wish (it still takes several minutes to complete a search).
Is my assumption correct, that there is no way to do a search over a big IMAP folder hierarchy in a reasonable amount of time, because each folder has to be 'selected', and only one folder can be selected at once?
Patrick.
-- STAR Software (Shanghai) Co., Ltd. http://www.star-group.net/ Phone: +86 (21) 3462 7688 x 826 Fax: +86 (21) 3462 7779
PGP key: https://stshacom1.star-china.net/keys/patrick_nagel.asc Fingerprint: E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005
On Tuesday 24 June 2008, Patrick Nagel wrote:
Hi,
I tried the FTS (and FTS Squat) plugin today, and it works as advertised.
But: On my 13000 folders with 160000 mails maildir I use for testing, the speed increase is not as big as one would wish (it still takes several minutes to complete a search).
Is my assumption correct, that there is no way to do a search over a big IMAP folder hierarchy in a reasonable amount of time, because each folder has to be 'selected', and only one folder can be selected at once?
Patrick.
I think this subject is more suitable for the mail's content. At first I focussed on the FTS index, and changed the mail later on, after which I forgot to change the subject...
Patrick.
-- STAR Software (Shanghai) Co., Ltd. http://www.star-group.net/ Phone: +86 (21) 3462 7688 x 826 Fax: +86 (21) 3462 7779
PGP key: https://stshacom1.star-china.net/keys/patrick_nagel.asc Fingerprint: E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005
On Tue, 2008-06-24 at 15:29 +0800, Patrick Nagel wrote:
On Tuesday 24 June 2008, Patrick Nagel wrote:
Hi,
I tried the FTS (and FTS Squat) plugin today, and it works as advertised.
But: On my 13000 folders with 160000 mails maildir I use for testing, the speed increase is not as big as one would wish (it still takes several minutes to complete a search).
Is my assumption correct, that there is no way to do a search over a big IMAP folder hierarchy in a reasonable amount of time, because each folder has to be 'selected', and only one folder can be selected at once?
Yes, they have to be selected. There isn't any way currently in IMAP to search from multiple mailboxes using a single command, so even if Dovecot implemented a Squat index that indexed mails from all mailboxes, you'd still have to implement a non-standard extension to use that.
Hmm. Or v1.2 has virtual mailboxes - you could create a single virtual mailbox from all your other mailboxes and then search it. I think if Squat is enabled it'll create a single index from all the mails. I'm not sure if I want to leave it like that though..
I have also been thinking about making Squat indexes global for all mailboxes. If done well it should reduce disk space as well as enable fast multi-mailbox searches, but I'm a bit worried about memory usage and other slowness when updating the index. The Squat building/updating could use more work, but I haven't yet figured out a great solution for it.
On Tuesday 24 June 2008, Timo Sirainen wrote:
Yes, they have to be selected. There isn't any way currently in IMAP to search from multiple mailboxes using a single command, so even if Dovecot implemented a Squat index that indexed mails from all mailboxes, you'd still have to implement a non-standard extension to use that.
I see. That's what I thought. :(
Hmm. Or v1.2 has virtual mailboxes - you could create a single virtual mailbox from all your other mailboxes and then search it. I think if Squat is enabled it'll create a single index from all the mails. I'm not sure if I want to leave it like that though..
How about making it configurable? I'm sure there are scenarios where it's not desirable to have an index for each virtual mailbox (which sounds like a very cool concept, by the way) - but like in my case, it would be a great workaround :)
I have also been thinking about making Squat indexes global for all mailboxes. If done well it should reduce disk space as well as enable fast multi-mailbox searches, but I'm a bit worried about memory usage and other slowness when updating the index. The Squat building/updating could use more work, but I haven't yet figured out a great solution for it.
I'm not sure if it would reduce disk space usage... I'm thinking of the following:
Now (fictitious, don't know how dovecot.index.search really looks like):
mailbox1.in.a.subfolder.of.a.subfolder.of.a.subfolder/dovecot.index.search: INDEX UID word 12345 ord 12345 rd 12345 d 12345 ...
Then (of course also fictitious):
dovecot.global.index.search: INDEX MAILBOX UID word mailbox1.in.a.subfolder.of.a.subfolder.of.a.subfolder 12345 ord mailbox1.in.a.subfolder.of.a.subfolder.of.a.subfolder 12345 rd mailbox1.in.a.subfolder.of.a.subfolder.of.a.subfolder 12345 d mailbox1.in.a.subfolder.of.a.subfolder.of.a.subfolder 12345 ...
Of course this would be very compressible, but in an uncompressed form it would probably be much bigger than now all dovecot.index.search files together. This would cause the need for mailbox UIDs, so that the path names only need to be stored in a map once... or something along those lines.
Anyway, I think improved (= faster) search capabilities are a huge plus for an IMAP server, because the possibility to search in old mails is what makes people keep their mails (available, on the server) in the first place...
Patrick.
-- STAR Software (Shanghai) Co., Ltd. http://www.star-group.net/ Phone: +86 (21) 3462 7688 x 826 Fax: +86 (21) 3462 7779
PGP key: https://stshacom1.star-china.net/keys/patrick_nagel.asc Fingerprint: E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005
On Tue, 24 Jun 2008, Timo Sirainen wrote:
Hmm. Or v1.2 has virtual mailboxes - you could create a single virtual mailbox from all your other mailboxes and then search it. I think if Squat is enabled it'll create a single index from all the mails. I'm not sure if I want to leave it like that though..
I hope that the index is shared - that you "index the index" by inode number, not filename or message UID in a mailbox, since that way you can avoid duplicate storage of index data between virtual mailboxes and normal ones.
I have also been thinking about making Squat indexes global for all mailboxes. If done well it should reduce disk space as well as enable fast multi-mailbox searches, but I'm a bit worried about memory usage and other slowness when updating the index. The Squat building/updating could use more work, but I haven't yet figured out a great solution for it.
Well, I think it would be okay - deploy it and we'll all tell you. (-;
(As always, thanks for this amazing software.)
-- Asheesh.
-- The wonderful thing about a dancing bear is not how well he dances, but that he dances at all.
Is my assumption correct, that there is no way to do a search over a big IMAP folder hierarchy in a reasonable amount of time, because each folder has to be 'selected', and only one folder can be selected at once?
This isn't 100% what you're looking for, but.. consider looking at "mairix". It does do full text search of mbox and maildir; it is a cli. Give it a search, and it populates a mail folder (using hardlinks if possible) with the results.
I use it when I want to search my archives; the speed makes up for it not being native to the imap client.
participants (4)
-
Asheesh Laroia
-
Jason Fesler
-
Patrick Nagel
-
Timo Sirainen