Re: [Dovecot] **OFF LIST** Re: body search very slow since upgrade from 1.0.15 to 1.2.10
Stan Hoeppner put forth on 2/24/2010 12:17 AM:
Timo Sirainen put forth on 2/23/2010 11:06 PM:
No, body searches aren't indexed (without fts plugins).
I wish you'd have mentioned the Squat plugin earlier Timo. I feel I've made a fool of myself. I just setup squat and after a long first search on that 50MB, 11K+ messages mbox file, every search after the first has taken less than 5 seconds, most only 2 seconds. It's performing exactly as I'd like. It's amazing.
I didn't ever setup any fts plugins on 1.0.15. I'm left wondering how body searches were so fast if I didn't have an fts plugin configured.
Anyway, setting up Squat on 1.2.10 sure solved the long search time problem. :)
Thanks for the help Timo!
-- Stan
Ok, the previous couple of emails had to be off list, but not now.
Stan Hoeppner put forth on 2/24/2010 1:21 AM:
Stan Hoeppner put forth on 2/24/2010 12:17 AM:
Timo Sirainen put forth on 2/23/2010 11:06 PM:
No, body searches aren't indexed (without fts plugins).
I wish you'd have mentioned the Squat plugin earlier Timo. I feel I've made a fool of myself. I just setup squat and after a long first search on that 50MB, 11K+ messages mbox file, every search after the first has taken less than 5 seconds, most only 2 seconds. It's performing exactly as I'd like. It's amazing.
I didn't ever setup any fts plugins on 1.0.15. I'm left wondering how body searches were so fast if I didn't have an fts plugin configured.
Anyway, setting up Squat on 1.2.10 sure solved the long search time problem. :)
Thanks for the help Timo!
On Wed, 2010-02-24 at 01:21 -0600, Stan Hoeppner wrote:
Stan Hoeppner put forth on 2/24/2010 12:17 AM:
Timo Sirainen put forth on 2/23/2010 11:06 PM:
No, body searches aren't indexed (without fts plugins).
I wish you'd have mentioned the Squat plugin earlier Timo. I feel I've made a fool of myself. I just setup squat and after a long first search on that 50MB, 11K+ messages mbox file, every search after the first has taken less than 5 seconds, most only 2 seconds. It's performing exactly as I'd like. It's amazing.
Well, when index is up-to-date it's fast. But after you've received a few mails, at least with me it seemed to spend more time updating the index than just doing the regular search.
I didn't ever setup any fts plugins on 1.0.15. I'm left wondering how body searches were so fast if I didn't have an fts plugin configured.
Looks like there's something very wrong with mbox with v1.2+. It's doing a *lot* of message header parsing work that doesn't happen with v1.1 or with other mailbox formats. Probably because I fixed some bugs where it was wrongly caching some state, but now it's not caching it long enough.
Timo Sirainen put forth on 2/24/2010 12:27 PM:
Well, when index is up-to-date it's fast. But after you've received a few mails, at least with me it seemed to spend more time updating the index than just doing the regular search.
Ahh, I see what you mean. It's slower after new mail arrives, 20 seconds or so, but it's still much much faster than without squat which was multiple minutes.
Looks like there's something very wrong with mbox with v1.2+. It's doing a *lot* of message header parsing work that doesn't happen with v1.1 or with other mailbox formats. Probably because I fixed some bugs where it was wrongly caching some state, but now it's not caching it long enough.
I guess you didn't have enough (any?) people testing mbox body search before the v1.2 release. Is everyone but me using maildir? Makes me wish I had an extra box so I could do dovecot devel version testing against mbox.
-- Stan
Quoting Stan Hoeppner <stan@hardwarefreak.com>:
I guess you didn't have enough (any?) people testing mbox body search before the v1.2 release. Is everyone but me using maildir? Makes me wish I had an extra box so I could do dovecot devel version testing against mbox.
I'm running mbox with 1.2 and not seeing any problems... But that may be because I threw a lot of hardware at it?
-- Stan
-- Eric Rostetter The Department of Physics The University of Texas at Austin
Go Longhorns!
Eric Rostetter put forth on 2/24/2010 9:13 PM:
Quoting Stan Hoeppner <stan@hardwarefreak.com>:
I guess you didn't have enough (any?) people testing mbox body search before the v1.2 release. Is everyone but me using maildir? Makes me wish I had an extra box so I could do dovecot devel version testing against mbox.
I'm running mbox with 1.2 and not seeing any problems... But that may be because I threw a lot of hardware at it?
Hi Eric. Not sure if even fast hardware searching 11,000+ message count mbox'en without an FTS plugin would give speedy results, given Timo's discovery of earlier today.
Timo Sirainen put forth on 2/24/2010 12:27 PM:
Looks like there's something very wrong with mbox with v1.2+. It's doing a *lot* of message header parsing work that doesn't happen with v1.1 or with other mailbox formats. Probably because I fixed some bugs where it was wrongly caching some state, but now it's not caching it long enough.
-- Stan
Quoting Stan Hoeppner <stan@hardwarefreak.com>:
I'm running mbox with 1.2 and not seeing any problems... But that may be because I threw a lot of hardware at it?
Hi Eric. Not sure if even fast hardware searching 11,000+ message count mbox'en without an FTS plugin would give speedy results, given Timo's discovery of earlier today.
Can't say, as I don't personally have a 11K+ message mbox handy that I can run the test on. I just don't keep that much mail around of my own...
But it works okay on my 4K to 5K message mbox files, which are the largest I have... Usually takes about 1 second per 1K messages, so about 4 seconds for the 4K mbox, 5 seconds for the 5K mbox, etc. Of course, a bit slower when the server is overly busy... I've had it take 10 or 12 seconds before...
I do have users with large mbox files (12K+) who have never complained, but that doesn't mean much... I don't know which ones do body searches, or how often they do them, etc. And not all users complain even when there is a slowness or outright problem...
So, no scientific data... But no complaints here... Then again, I only have a small number of users with mboxes that large... We discourage that kind of thing here...
-- Eric Rostetter The Department of Physics The University of Texas at Austin
Go Longhorns!
Eric Rostetter put forth on 2/24/2010 11:04 PM:
But it works okay on my 4K to 5K message mbox files, which are the largest I have... Usually takes about 1 second per 1K messages, so about 4 seconds for the 4K mbox, 5 seconds for the 5K mbox, etc. Of course, a bit slower when the server is overly busy... I've had it take 10 or 12 seconds before...
Are you using any FTS plugins? Squat? And are you sure you're doing full body searches, not just headers only? I'm not questioning your abilities or vigilance, just trying to make sure we're on the same page.
-- Stan
Quoting Stan Hoeppner <stan@hardwarefreak.com>:
Are you using any FTS plugins? Squat?
Nope, not as far as I know. Dovecot -n lists the following plugins:
mail_plugins(default): zlib acl imap_acl mail_plugins(imap): zlib acl imap_acl mail_plugins(pop3): zlib mail_plugin_dir(default): /usr/lib64/dovecot/imap mail_plugin_dir(imap): /usr/lib64/dovecot/imap mail_plugin_dir(pop3): /usr/lib64/dovecot/pop3 plugin: acl: vfile:/var/dovecot/acls acl_shared_dict: file:/var/dovecot/indexes/shared_mailboxes
And are you sure you're doing full body searches, not just headers only?
Yes. Header searches are much faster. :)
-- Eric Rostetter The Department of Physics The University of Texas at Austin
Go Longhorns!
On 02/24/2010 07:27 PM Timo Sirainen wrote:
Well, when index is up-to-date it's fast. But after you've received a few mails, at least with me it seemed to spend more time updating the index than just doing the regular search.
I've never setup any of the three FTS plugins. I've only seen, that the plugins are used as mail_plugin in the protocol imap/pop3 sections.
When the index should be up-to-date all the time (what's very important (IMHO)), I'm asking myself: Why are there no fts plugins for the lda and lmtp section? When the index would by updated on delivery, it should be always up-to-date. Or did I overlook something?
Just a quick idea: How about a Xapian <http://xapian.org/> fts plugin?
Regards, Pascal
The trapper recommends today: fabaceae.1005606@localdomain.org
On 25.2.2010, at 7.47, Pascal Volk wrote:
On 02/24/2010 07:27 PM Timo Sirainen wrote:
Well, when index is up-to-date it's fast. But after you've received a few mails, at least with me it seemed to spend more time updating the index than just doing the regular search.
I've never setup any of the three FTS plugins. I've only seen, that the plugins are used as mail_plugin in the protocol imap/pop3 sections.
When the index should be up-to-date all the time (what's very important (IMHO)), I'm asking myself: Why are there no fts plugins for the lda and lmtp section? When the index would by updated on delivery, it should be always up-to-date. Or did I overlook something?
It seemed a bit difficult to implement at first. Probably easier now.
Updating FTS indexes when user doesn't use them is a waste of CPU and space. Of course, this could be done in a similar way as caching. Update FTS for mailboxes where it's actually used.
Squat's updates would still be slow..
Just a quick idea: How about a Xapian <http://xapian.org/> fts plugin?
Go ahead ;)
On 02/25/2010 06:55 AM Timo Sirainen wrote:
On 25.2.2010, at 7.47, Pascal Volk wrote:
When the index should be up-to-date all the time (what's very important (IMHO)), I'm asking myself: Why are there no fts plugins for the lda and lmtp section? When the index would by updated on delivery, it should be always up-to-date. Or did I overlook something?
- Updating FTS indexes when user doesn't use them is a waste of CPU and space. Of course, this could be done in a similar way as caching. Update FTS for mailboxes where it's actually used.
Sure. But update the index only, when lda/lmtp finds a ".fts_NAME" directory in a user's home directory. If there are no search indexes in the user's home directories, there is no need to update or create theme.
- Squat's updates would still be slow..
Drop the fts_squat plugin if it is too slow. :-P
Just a quick idea: How about a Xapian <http://xapian.org/> fts plugin? Go ahead ;)
Give me some time and a lot of C-knowledge … But eh, where are all the C programmers? :-)
Regards, Pascal
The trapper recommends today: cafebabe.1005607@localdomain.org
On 24.2.2010, at 20.27, Timo Sirainen wrote:
Looks like there's something very wrong with mbox with v1.2+. It's doing a *lot* of message header parsing work that doesn't happen with v1.1 or with other mailbox formats. Probably because I fixed some bugs where it was wrongly caching some state, but now it's not caching it long enough.
Looks like some input stream seeking optimizations are broken (when one input stream reads from another, which reads from another, ...). I already managed to fix the performance problem, but now it's corrupting saved mails sometimes. So a while longer to get it fully fixed :) And since it's a pretty big change, I'm not sure if I want to risk breaking v1.2 by changing it, so maybe it's v2.0 only.
Timo Sirainen put forth on 2/25/2010 1:04 PM:
On 24.2.2010, at 20.27, Timo Sirainen wrote:
Looks like there's something very wrong with mbox with v1.2+. It's doing a *lot* of message header parsing work that doesn't happen with v1.1 or with other mailbox formats. Probably because I fixed some bugs where it was wrongly caching some state, but now it's not caching it long enough.
Looks like some input stream seeking optimizations are broken (when one input stream reads from another, which reads from another, ...). I already managed to fix the performance problem, but now it's corrupting saved mails sometimes. So a while longer to get it fully fixed :) And since it's a pretty big change, I'm not sure if I want to risk breaking v1.2 by changing it, so maybe it's v2.0 only.
I tend to agree with your judgment Timo. While I've found that Squat is less than optimal, for the same reasons you've mentioned, it is adequate for now, and much better than without it!
I can't say for sure without side-by-side testing, but my recollection is that standard search in 1.0.l5 was, overall, faster than Squat in 1.2.10, specifically for the case where Squat has to re-index after a few new mails have arrived in the mbox file.
If the fix must wait for 2.0, then it must wait. I must say, if Squat didn't exist, I'd probably be screaming like a baby demanding a fix right now. ;) I need my body search capability--use it almost daily.
Thanks again for your dedication to Dovecot Timo. Without you we'd all be running vastly inferior IMAP servers and would probably be frowning much of the time instead of smiling. :)
-- Stan
On Thu, 2010-02-25 at 21:04 +0200, Timo Sirainen wrote:
Looks like some input stream seeking optimizations are broken (when one input stream reads from another, which reads from another, ...). I already managed to fix the performance problem, but now it's corrupting saved mails sometimes. So a while longer to get it fully fixed :) And since it's a pretty big change, I'm not sure if I want to risk breaking v1.2 by changing it, so maybe it's v2.0 only.
Fixed now in v2.0 and optimized also conversion to uppercase:
http://hg.dovecot.org/dovecot-2.0/rev/de2798fbbae6 http://hg.dovecot.org/dovecot-2.0/rev/23858ce6422e
After this, searching your spam mailbox goes from:
v1.2 OK Search completed (26.084 secs).
to
v2.0 OK Search completed (2.172 secs).
It's still slower than v1.0's 1.83 secs, but the increased slowness is most likely because v1.0 didn't support case-insensitive unicode searches.
Timo Sirainen put forth on 2/28/2010 6:21 AM:
On Thu, 2010-02-25 at 21:04 +0200, Timo Sirainen wrote:
Looks like some input stream seeking optimizations are broken (when one input stream reads from another, which reads from another, ...). I already managed to fix the performance problem, but now it's corrupting saved mails sometimes. So a while longer to get it fully fixed :) And since it's a pretty big change, I'm not sure if I want to risk breaking v1.2 by changing it, so maybe it's v2.0 only.
Fixed now in v2.0 and optimized also conversion to uppercase:
http://hg.dovecot.org/dovecot-2.0/rev/de2798fbbae6 http://hg.dovecot.org/dovecot-2.0/rev/23858ce6422e
After this, searching your spam mailbox goes from:
v1.2 OK Search completed (26.084 secs).
to
v2.0 OK Search completed (2.172 secs).
It's still slower than v1.0's 1.83 secs, but the increased slowness is most likely because v1.0 didn't support case-insensitive unicode searches.
Awesome-- 13x increase in speed. Nice work Timo. I'll definitely appreciate it when I move to 2.0. Maybe it'll be fast enough I can get rid of Squat.
Any chance these changes will make it as a bug fix into 1.2.11? How extensive were the changes required to fix this bug?
-- Stan
On Sun, 2010-02-28 at 08:34 -0600, Stan Hoeppner wrote:
Awesome-- 13x increase in speed. Nice work Timo. I'll definitely appreciate it when I move to 2.0. Maybe it'll be fast enough I can get rid of Squat.
Any chance these changes will make it as a bug fix into 1.2.11? How extensive were the changes required to fix this bug?
The changes were big enough that it could have had potential to break stuff. Also I tried backporting those changes to v1.2 and noticed that it did actually randomly make mbox saving assert-crash, but I couldn't really figure out why. But I didn't spend much time on figuring out why. Instead I noticed that there's a simple workaround for this: http://hg.dovecot.org/dovecot-1.2/rev/6c9f2ed821df
participants (4)
-
Eric Rostetter
-
Pascal Volk
-
Stan Hoeppner
-
Timo Sirainen