[Dovecot] fts squat -> webmail, fastcgi timeout: SOLR the answer?
Hi
I turned on fts squat to speed up full text searches over webmail. This works absolutely fantastic once the mailboxes are indexed.
But when they are indexed for the first time I'm getting a fastcgi timeout after 30 seconds resulting into a internal server error on the webserver. In the background the indexing job is getting finished after 80 seconds for a 25k emails, ~300MB mailbox.
Does solr allow user-offline indexing (no user interaction needed to kick on indexing) ? And the other question is, is there any mail user agent that would support server-side full text indexing and not create own indexes?
Thanks, Philipp
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, 10 Feb 2010, mailinglists@belfin.ch wrote:
Does solr allow user-offline indexing (no user interaction needed to kick on indexing) ?
You can talk IMAP directly as user:
setuid user:group export USER=user export HOME=/home/path/user /usr/sbin/dovecot --exec-mail imap
With system users you can do:
sudo -u user -H /usr/sbin/dovecot --exec-mail imap
Then issue whatever IMAP command you like.
Regards,
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iQEVAwUBS3J7lr+Vh58GPL/cAQIsDgf+M+xP5FTz1P+ZNVYiDIdEYJHbv51ER7WQ E529m9gVvJFsac77qIrj5FSTyJ0zxHAkYLYGaqKTKySJ6XKYUfq9HG/fmiTiRF1h YmrvTsrrxgaI2oSeT6MUTwKpnZ/LrwJCLo5Ord79AzgEEEwaTMKGBTzIq+HvH4ru hgX0HK47XMc05NM/IX9SnD0AdheC4q/v9u9xwvM199slLsxsB5wn0ZB1xhCaU/Vw xsHMweeWFyUEDB5D7QvlUHfI6thMDCdVuIth/X8tYWef2eFNbSiiMsjBdqMwXF0d ZuXjaCbfjpvNHjy6ZQzVLMQsBjx2OZA8F3MYZ94UhLz5Xo5sYcX8zA== =vkHC -----END PGP SIGNATURE-----
- Steffen Kaiser <skdovecot@smail.inf.fh-brs.de>:
On Wed, 10 Feb 2010, mailinglists@belfin.ch wrote:
Does solr allow user-offline indexing (no user interaction needed to kick on indexing) ?
You can talk IMAP directly as user:
setuid user:group export USER=user export HOME=/home/path/user /usr/sbin/dovecot --exec-mail imap
w00t, that's neat! So, if "some_command" provided me with a list of usernames and their respective home directories I could isssue something like:
some_command | while read USER HOME; do
export USER
export HOME
echo ". list * *" | /usr/sbin/dovecot --exec-mail imap |
awk '($2 == "LIST"){print $5}' | while read mailbox
do
cat << EOF | /usr/sbin/dovecot --exec-mail imap
. select $mailbox
. search text foo
. search body foo
EOF
done > /dev/null
done
Well, perhaps not exactly that (hi @ fork(2)!), but something along these lines. Could I? (Oh, please please please say "yes!")
Stefan
mailinglists@belfin.ch put forth on 2/10/2010 3:01 AM:
And the other question is, is there any mail user agent that would support server-side full text indexing and not create own indexes?
Thunderbird. Maybe all MUAs? Usually with smartly designed MUAs body searches are left to the IMAP server to perform. On standard TB, in the absence of about:config hacks, one cannot run a body search on an IMAP folder unless you check the "Run search on server" box. Anything else is a header search, and headers are already indexed by TB for pretty much everything. TB doesn't index IMAP message bodies, _unless_ you have IMAP folders synchronized locally and are running in offline mode. In the offline case your search will utilize a local index of message bodies.
-- Stan
On 2010-02-10 5:17 AM, Stan Hoeppner wrote:
mailinglists@belfin.ch put forth on 2/10/2010 3:01 AM:
And the other question is, is there any mail user agent that would support server-side full text indexing and not create own indexes?
Thunderbird. Maybe all MUAs? Usually with smartly designed MUAs body searches are left to the IMAP server to perform. On standard TB, in the absence of about:config hacks, one cannot run a body search on an IMAP folder unless you check the "Run search on server" box.
Interesting, I never noticed that checkbox - probably because I rarely use the advanced search...
But I'm confused by your answer...
First you said 'Thunderbird. Maybe all MUSa?' - then seemed to totally contradict yourself when you said 'On standard TB ... one cannot run a body search on an IMAP folder unless ...' ?
So - is there a user pref that can be set in user.js to enable full body server-side searches (ie for servers using dovecot with fts squat enabled)?
Anything else is a header search, and headers are already indexed by TB for pretty much everything.
My understanding is that not *all* headers are indexed, only 'Normal' headers - which is why I opened bug 543956...
--
Best regards,
Charles
Charles Marcus put forth on 2/10/2010 8:04 AM:
On 2010-02-10 5:17 AM, Stan Hoeppner wrote:
mailinglists@belfin.ch put forth on 2/10/2010 3:01 AM:
And the other question is, is there any mail user agent that would support server-side full text indexing and not create own indexes?
Thunderbird. Maybe all MUAs? Usually with smartly designed MUAs body searches are left to the IMAP server to perform. On standard TB, in the absence of about:config hacks, one cannot run a body search on an IMAP folder unless you check the "Run search on server" box.
Interesting, I never noticed that checkbox - probably because I rarely use the advanced search...
But I'm confused by your answer...
First you said 'Thunderbird. Maybe all MUSa?' - then seemed to totally contradict yourself when you said 'On standard TB ... one cannot run a body search on an IMAP folder unless ...' ?
I haven't used any other MUA for years, so I don't know how they do their searches. However, it doesn't make technical sense to perform a full body search in the MUA since all the body data is typically left on the IMAP server. This would make the search horribly slow over dsl/cable and generate a ton of useless network traffic. On a local network it would be much faster, but again, it would generate a ton of useless traffic. Body searches should always be performed by the IMAP server itself.
So - is there a user pref that can be set in user.js to enable full body server-side searches (ie for servers using dovecot with fts squat enabled)?
You misread what I sent. My comment relating to about:config hacking was a disclaimer. I don't know if there are any config hacks to eliminate the need for that check box or not. I stated the disclaimer so someone wouldn't chime in with "Yes you can! You just edit *this* is about:config". Got it? Ok, good. Again, TB can _ONLY_ do server side body searches. UNLESS you've enabled off line mode and have synchronized the IMAP folders. To "synchronize" in TB terminology means to download a complete copy of the selected IMAP folders to the local hard disk. Once the IMAP folders are on the local hard disk, and you enable off line mode, any body searches of your IMAP folders will occur on the local machine. This isn't difficult to understand is it?
Anything else is a header search, and headers are already indexed by TB for pretty much everything.
My understanding is that not *all* headers are indexed, only 'Normal' headers - which is why I opened bug 543956...
Ok, I'll go with that. RFC821 headers are indexed but MIME headers are not. And MIME headers probably shouldn't be anyway, as there would be no benefit except maybe in extremely rare cases.
-- Stan
On 2010-02-10 2:28 PM, Stan Hoeppner wrote:
Charles Marcus put forth on 2/10/2010 8:04 AM:
I haven't used any other MUA for years,
Me neither...
so I don't know how they do their searches. However, it doesn't make technical sense to perform a full body search in the MUA since all the body data is typically left on the IMAP server.
Agreed... but this is exactly what TB2 did for so long that used to drive me crazy. That and download the same attachment over and over and over again.
This would make the search horribly slow over dsl/cable and generate a ton of useless network traffic.
Yep - and again, this is exactly what TB2 did all the time - and is why I generally didn't do full body searches.
So - is there a user pref that can be set in user.js to enable full body server-side searches (ie for servers using dovecot with fts squat enabled)?
Again, TB can _ONLY_ do server side body searches. UNLESS you've enabled off line mode and have synchronized the IMAP folders.
Ok, I was apparently thinking about this wrong.
To "synchronize" in TB terminology means to download a complete copy of the selected IMAP folders to the local hard disk. Once the IMAP folders are on the local hard disk, and you enable off line mode, any body searches of your IMAP folders will occur on the local machine. This isn't difficult to understand is it?
No, and I'm not sure whats up with the attitude, but please lose it.
My understanding is that not *all* headers are indexed, only 'Normal' headers - which is why I opened bug 543956...
Ok, I'll go with that. RFC821 headers are indexed but MIME headers are not.
I don't think this is accurate. More than just the MIME headers are *not* currently downloaded. Custom headers are not downloaded, and neither are the 'Received' date header or the X-Delivered-To header to name two but apparently there are more.
And MIME headers probably shouldn't be anyway, as there would be no benefit except maybe in extremely rare cases.
How about so an intellignet MUA could decide what MIME parts to download and when? Unless it takes up a lot of extra disk space, imo there is simply no good reason not to download the FULL headers regardless of how offline/sync settings are defined.
--
Best regards,
Charles
On 2010-02-10 2:28 PM, Stan Hoeppner wrote:
Charles Marcus put forth on 2/10/2010 8:04 AM:
I haven't used any other MUA for years,
Me neither...
so I don't know how they do their searches. However, it doesn't make technical sense to perform a full body search in the MUA since all the body data is typically left on the IMAP server.
Agreed... but this is exactly what TB2 did for so long that used to drive me crazy. That and download the same attachment over and over and over again.
This would make the search horribly slow over dsl/cable and generate a ton of useless network traffic.
Yep - and again, this is exactly what TB2 did all the time - and is why I generally didn't do full body searches.
So - is there a user pref that can be set in user.js to enable full body server-side searches (ie for servers using dovecot with fts squat enabled)?
Again, TB can _ONLY_ do server side body searches. UNLESS you've enabled off line mode and have synchronized the IMAP folders.
Ok, I was apparently thinking about this wrong.
To "synchronize" in TB terminology means to download a complete copy of the selected IMAP folders to the local hard disk. Once the IMAP folders are on the local hard disk, and you enable off line mode, any body searches of your IMAP folders will occur on the local machine. This isn't difficult to understand is it?
No, and I'm not sure whats up with the attitude, but please lose it.
My understanding is that not *all* headers are indexed, only 'Normal' headers - which is why I opened bug 543956...
Ok, I'll go with that. RFC821 headers are indexed but MIME headers are not.
I don't think this is accurate. More than just the MIME headers are *not* currently downloaded. Custom headers are not downloaded, and neither are the 'Received' date header or the X-Delivered-To header to name two but apparently there are more.
And MIME headers probably shouldn't be anyway, as there would be no benefit except maybe in extremely rare cases.
How about so an intellignet MUA could decide what MIME parts to download and when? Unless it takes up a lot of extra disk space, imo there is simply no good reason not to download the FULL headers regardless of how offline/sync settings are defined.
--
Best regards,
Charles
On 2010-02-10 2:28 PM, Stan Hoeppner wrote:
Charles Marcus put forth on 2/10/2010 8:04 AM:
I haven't used any other MUA for years,
Me neither...
so I don't know how they do their searches. However, it doesn't make technical sense to perform a full body search in the MUA since all the body data is typically left on the IMAP server.
Agreed... but this is exactly what TB2 did for so long that used to drive me crazy. That and download the same attachment over and over and over again.
This would make the search horribly slow over dsl/cable and generate a ton of useless network traffic.
Yep - and again, this is exactly what TB2 did all the time - and is why I generally didn't do full body searches.
So - is there a user pref that can be set in user.js to enable full body server-side searches (ie for servers using dovecot with fts squat enabled)?
Again, TB can _ONLY_ do server side body searches. UNLESS you've enabled off line mode and have synchronized the IMAP folders.
Ok, I was apparently thinking about this wrong.
To "synchronize" in TB terminology means to download a complete copy of the selected IMAP folders to the local hard disk. Once the IMAP folders are on the local hard disk, and you enable off line mode, any body searches of your IMAP folders will occur on the local machine. This isn't difficult to understand is it?
No, and I'm not sure whats up with the attitude, but please lose it.
My understanding is that not *all* headers are indexed, only 'Normal' headers - which is why I opened bug 543956...
Ok, I'll go with that. RFC821 headers are indexed but MIME headers are not.
I don't think this is accurate. More than just the MIME headers are *not* currently downloaded. Custom headers are not downloaded, and neither are the 'Received' date header or the X-Delivered-To header to name two but apparently there are more.
And MIME headers probably shouldn't be anyway, as there would be no benefit except maybe in extremely rare cases.
How about so an intellignet MUA could decide what MIME parts to download and when? Unless it takes up a lot of extra disk space, imo there is simply no good reason not to download the FULL headers regardless of how offline/sync settings are defined.
Best regards,
Charles Marcus I.T. Director Media Brokers International, Inc. 678.514.6200 x224 678.514.6299 fax
Sorry about the triplicates... I just hit a TB bug, so I now have to document how to reproduce it and go report it... fun, fun...
participants (5)
-
Charles Marcus
-
mailinglists@belfin.ch
-
Stan Hoeppner
-
Stefan Foerster
-
Steffen Kaiser