[Dovecot] compressed mboxes very slow
I have some archive mails in gzipped mboxes. I could use them with dovecot 1.x without problems. But recently I have installed dovecot 2.0.12, and they are slow. very slow.
Creating index files takes about 10 minutes for ~20M file with 560 messages for bzipped mbox, for gzipped is little better but still unusable :(
Stracing dovecot process shows that every ~ 20 messages it rereads complete mbox file.
Am I doing something wrong? KJ
-- http://modnebzdury.wordpress.com/2009/10/01/niewiarygodny-list-prof-majewski...
On 5/6/2011 3:07 PM, Kamil Jońca wrote:
I have some archive mails in gzipped mboxes. I could use them with dovecot 1.x without problems. But recently I have installed dovecot 2.0.12, and they are slow. very slow.
Creating index files takes about 10 minutes for ~20M file with 560 messages for bzipped mbox, for gzipped is little better but still unusable :(
What other software, if any, was also upgraded/changed when you upgraded to Dovecot 2.0.12? Libraries? Filesystem? Daemons? What OS/version? Was the OS upgraded? Is this a new machine as well as new software? If so how did you copy the files to the new system? Could they have been mildly corrupted along the way? Did this bad behavior start directly after the upgrade or did 2.0.12 run the zipped mbox files at acceptable speed for a while? Did you add/enable any new Dovecot plugins that you weren't running in 1.2.x?
Stracing dovecot process shows that every ~ 20 messages it rereads complete mbox file.
Can you be a bit more specific here? What do you mean by "rereads complete mbox file"? I'm not a dev, but that sounds suspiciously like an error handling mechanism. I.e. an error occurred while processing, or the file may have changed while processing, so we start over. Could you have a buggy inotify/dnotify or something along those lines? Do you now have something else running say, at the filesystem level, that that is making Dovecot think the file has changed even though it hasn't? Are you zipping these mbox files via a cron job that is running every few seconds instead of every few hours or days?
Something is apparently causing Dovecot to reread these files regularly, and I'd guess it's probably not a Dovecot bug. Did you run strace when accessing a non-compressed mbox file for comparison?
-- Stan
Stan Hoeppner stan@hardwarefreak.com writes:
On 5/6/2011 3:07 PM, Kamil Jońca wrote:
I have some archive mails in gzipped mboxes. I could use them with dovecot 1.x without problems. But recently I have installed dovecot 2.0.12, and they are slow. very slow.
Creating index files takes about 10 minutes for ~20M file with 560 messages for bzipped mbox, for gzipped is little better but still unusable :(
What other software, if any, was also upgraded/changed when you upgraded to Dovecot 2.0.12? Libraries? Filesystem? Daemons? What
IIRC only dovecot - I simply upgrade debian package via aptitude.
they have been mildly corrupted along the way? Did this bad behavior start directly after the upgrade or did 2.0.12 run the zipped mbox Yes, immediately after upgrade.
files at acceptable speed for a while? Did you add/enable any new Dovecot plugins that you weren't running in 1.2.x?
No, the only thing was converting old config to dovecot-2 config.
Stracing dovecot process shows that every ~ 20 messages it rereads complete mbox file.
Can you be a bit more specific here? What do you mean by "rereads complete mbox file"? I'm not a dev, but that sounds suspiciously like
Sorry, my fault, more correctly is to say: "regularly".
an error handling mechanism. I.e. an error occurred while processing, or the file may have changed while processing, so we start over.
I'm almost sure that file is not changed.
Could you have a buggy inotify/dnotify or something along those lines? How to check it?
Do you now have something else running say, at the filesystem level, that that is making Dovecot think the file has changed even though it hasn't? Are you zipping these mbox files via a cron job that is running every few seconds instead of every few hours or days? No. These files were compressed once by mutt, and then only read as archive via dovecot.
Something is apparently causing Dovecot to reread these files regularly, and I'd guess it's probably not a Dovecot bug. Did you run strace when accessing a non-compressed mbox file for comparison?
http://strony.aster.pl/kjonca/dovecot.log.gz <- uncompressed mbox http://strony.aster.pl/kjonca/dovecot.gz.log.gz <- gzipped mbox
KJ
-- http://blogdebart.pl/2009/12/22/mamy-chorych-dzieci/ KRETYNIZM - ułomność predysponująca często do wampiryzmu (J.Collin de Plancy "Słownik wiedzy tajemnej")
On 5/8/2011 5:21 AM, Kamil Jońca wrote:
Stan Hoeppnerstan@hardwarefreak.com writes:
On 5/6/2011 3:07 PM, Kamil Jońca wrote:
I have some archive mails in gzipped mboxes. I could use them with dovecot 1.x without problems. But recently I have installed dovecot 2.0.12, and they are slow. very slow.
Creating index files takes about 10 minutes for ~20M file with 560 messages for bzipped mbox, for gzipped is little better but still unusable :(
What other software, if any, was also upgraded/changed when you upgraded to Dovecot 2.0.12? Libraries? Filesystem? Daemons? What
IIRC only dovecot - I simply upgrade debian package via aptitude.
The latest Debian stable dovecot package is 1.2.15-4. If 'aptitude upgrade' pulled 2.0.12 then you are running either testing or unstable, or you're using non-official mirrors. Either way, you can expect to have some problems. Also, you probably need to be asking on debian-user or asking the maintainers directly. And you need to be able to give them an actual bug report. I'm guessing the problem is Debian specific and not vanilla Dovecot 2.0.12 specific. Timo hasn't responded to you yet, which may be a good indication of this.
they have been mildly corrupted along the way? Did this bad behavior start directly after the upgrade or did 2.0.12 run the zipped mbox Yes, immediately after upgrade.
Look at your aptitude and/or dkpg logs to see what other packages, if any, got upgraded/replaced when you installed dovecot.
files at acceptable speed for a while? Did you add/enable any new Dovecot plugins that you weren't running in 1.2.x?
No, the only thing was converting old config to dovecot-2 config.
Stracing dovecot process shows that every ~ 20 messages it rereads complete mbox file.
Can you be a bit more specific here? What do you mean by "rereads complete mbox file"? I'm not a dev, but that sounds suspiciously like
Sorry, my fault, more correctly is to say: "regularly".
an error handling mechanism. I.e. an error occurred while processing, or the file may have changed while processing, so we start over.
I'm almost sure that file is not changed.
It probably didn't, given the fact that Dovecot won't write to zipped mbox files, period. But if you have a broken inotify/dnotify setup it may appear to Dovecot that the file has changed. Such things are common with testing/unstable distros. Changes to the kernel, APIs, and apps occur rapidly. Such distros are meant for developers and end users with the knowledge and ability to file concise bug reports after identifying problems.
Inotify may not be the problem at all, but it seems a possibility given that Dovecot is apparently stopping decompression and rereading the file multiple times until finished. I've not looked at the Dovecot source, but this seems a likely cause of the reread.
Could you have a buggy inotify/dnotify or something along those lines? How to check it?
If you're running testing/unstable you should already know how to check this. Inotify is a kernel API. For Debian Dovecot to use inotify it must be compiled with the build option 'notify=inotify'. You'll need to see the package maintainer's build script. You'll also need to look at the kernel .config used to build your kernel as inotify must be built into your kernel.
Do you now have something else running say, at the filesystem level, that that is making Dovecot think the file has changed even though it hasn't? Are you zipping these mbox files via a cron job that is running every few seconds instead of every few hours or days? No. These files were compressed once by mutt, and then only read as archive via dovecot.
Was mutt upgraded along with dovecot when you ran 'aptitude --safe upgrade'?
Have you tested any other IMAP client such as Thunderbird to eliminate mutt as the cause of the problem?
Something is apparently causing Dovecot to reread these files regularly, and I'd guess it's probably not a Dovecot bug. Did you run strace when accessing a non-compressed mbox file for comparison?
http://strony.aster.pl/kjonca/dovecot.log.gz<- uncompressed mbox http://strony.aster.pl/kjonca/dovecot.gz.log.gz<- gzipped mbox
I didn't ask for the files but the results of your analysis. This is your system and it's your job to troubleshoot it. I'm simply trying to assist you in your efforts.
If this is a production system I suggest you downgrade to your previous Dovecot version that was working properly, then build a test rig to troubleshoot this. If that's not in your cards, I suggest sticking with Debian Stable and newer Dovecot backports as they become available.
-- Stan
Stan Hoeppner stan@hardwarefreak.com writes:
On 5/8/2011 5:21 AM, Kamil Jońca wrote:
Stan Hoeppnerstan@hardwarefreak.com writes:
On 5/6/2011 3:07 PM, Kamil Jońca wrote:
I have some archive mails in gzipped mboxes. I could use them with dovecot 1.x without problems. But recently I have installed dovecot 2.0.12, and they are slow. very slow.
Creating index files takes about 10 minutes for ~20M file with 560 messages for bzipped mbox, for gzipped is little better but still unusable :(
What other software, if any, was also upgraded/changed when you upgraded to Dovecot 2.0.12? Libraries? Filesystem? Daemons? What
IIRC only dovecot - I simply upgrade debian package via aptitude.
The latest Debian stable dovecot package is 1.2.15-4. If 'aptitude upgrade' pulled 2.0.12 then you are running either testing or
Yes, its testing (sid) dist.
[...]
If you're running testing/unstable you should already know how to check this. Inotify is a kernel API. For Debian Dovecot to use
I know what is inotify, but I haven't to test it so far. :|
inotify it must be compiled with the build option 'notify=inotify'. You'll need to see the package maintainer's build script. You'll also need to look at the kernel .config used to build your kernel as inotify must be built into your kernel.
I have uninstalled debian's package
configure with --notify=inotify, and it looks that inotify exists and is used[1], make make install
then run with --8<---------------cut here---------------start------------->8--- #strace -o ~/tmp/dovecot/log -ff dovecot -c /etc/dovecot/dovecot.conf --8<---------------cut here---------------end--------------->8---
connect with mutt and select one of gzipped folders
connect with fetchmail and select another gzipped folder (with the same contents as in 5)
grepping strace logs shows that in both cases mboxes are reread regularly :( moreover there's no inotify_init (...) call by any dovecot process.
Any ideas?
[1] moreover inotifywatch works as expected.
KJ
-- http://modnebzdury.wordpress.com/2009/10/01/niewiarygodny-list-prof-majewski... "Sęk w tym, że człowiek rozumny jest statystycznie głupi" - Stanisław Lem.
On 5/9/2011 6:20 PM, Kamil Jońca wrote:
- connect with mutt and select one of gzipped folders
- connect with fetchmail and select another gzipped folder (with the same contents as in 5)
Does the problem occur with only one client, or are two or more clients required to reproduce the problem? If the former, you're simply unnecessarily confusing things by mentioning two clients in the problem reproduction case here.
grepping strace logs shows that in both cases mboxes are reread regularly :( moreover there's no inotify_init (...) call by any dovecot process.
Any ideas?
The first time the mbox file is read its filesystem blocks will be cached. The actual file read will take 1 second max on 10 year old hardware, 0.2 seconds on modern hardware. Thus, each subsequent time Dovecot 're-reads' the file, it will occur at RAM speed, something between 3.2 and 12.8GB/s depending on system age.
Thus, if it's taking 10 minutes to build the indexes for this gzipped mbox file, the cause of that elapsed 10 minutes isn't going to be the re-reading of the file, as this operation will be completing in microseconds.
If you go back to 1.2.15 and strace Dovecot, do you see the same re-reading of the mbox files after each 20 or so messages? This behavior may be normal. I'm not familiar with the code or I'd have told you if this is the case. Regardless, I doubt re-reading is the cause of the slowness. There's got to be something else in the loop besides the file read eating up the CPU time. Given it's 10 minutes for a 20MB file it would seem logical that a large wait is being inserted or Dovecot is waiting for a lock to expire--something along these lines. I'd doubt Dovecot is actually doing anything most of this 10 minutes.
I think you need to find out what Dovecot 2.x is waiting on that 1.2.15 wasn't.
-- Stan
Stan Hoeppner stan@hardwarefreak.com writes:
On 5/9/2011 6:20 PM, Kamil Jońca wrote:
- connect with mutt and select one of gzipped folders
- connect with fetchmail and select another gzipped folder (with the same contents as in 5)
Does the problem occur with only one client, or are two or more clients required to reproduce the problem? If the former, you're simply unnecessarily confusing things by mentioning two clients in the problem reproduction case here.
From one of your mail: --8<---------------cut here---------------start------------->8--- Have you tested any other IMAP client such as Thunderbird to eliminate mutt as the cause of the problem? --8<---------------cut here---------------end--------------->8--- So now I can say, this problem is not only with mutt
If you go back to 1.2.15 and strace Dovecot, do you see the same re-reading of the mbox files after each 20 or so messages? This behavior may be normal. I'm not familiar with the code or I'd have
I will try. KJ
-- http://modnebzdury.wordpress.com/2009/10/01/niewiarygodny-list-prof-majewski... Zuzanka najwyraźniej pełni obowiązki Albina Siwaka w tej ekipie... (C) Slawek Kotynski w dyskusj nt. pl.whatewer.SETI@home...
Timo Sirainen tss@iki.fi writes:
Here are some fixes:
http://hg.dovecot.org/dovecot-2.0/rev/15a0687ec9d0 http://hg.dovecot.org/dovecot-2.0/rev/66ec075a49d3
Hard to say, but I can't see any significant improvement after applying them :|
KJ
-- http://blogdebart.pl/2010/03/17/dalsze-przygody-swinki-w-new-jersey/
On 10.5.2011, at 23.32, Kamil Jońca wrote:
Timo Sirainen tss@iki.fi writes:
Here are some fixes:
http://hg.dovecot.org/dovecot-2.0/rev/15a0687ec9d0 http://hg.dovecot.org/dovecot-2.0/rev/66ec075a49d3
Hard to say, but I can't see any significant improvement after applying them :|
Get strace -tt output of such slowness and send them to me.
Timo Sirainen tss@iki.fi writes:
On 10.5.2011, at 23.32, Kamil Jońca wrote:
Timo Sirainen tss@iki.fi writes:
Here are some fixes:
http://hg.dovecot.org/dovecot-2.0/rev/15a0687ec9d0 http://hg.dovecot.org/dovecot-2.0/rev/66ec075a49d3
Hard to say, but I can't see any significant improvement after applying them :|
Get strace -tt output of such slowness and send them to me.
Some time ago I complained about very slow mbox. In the meantime I have no time to check it. But recently I have to use them (compressed mboxes) again. And no progress :( Under http://strony.aster.pl/kjonca/mbox.log.gz (WARN: ~170MB) is file with output of strace -tt when dovecot try to open rather huge bzipped mbox. And it took about four hours.
Second open is a lot faster (about minute), but sometimes message fetching take about 30-60s KJ
-- http://modnebzdury.wordpress.com/2009/10/01/niewiarygodny-list-prof-majewski... Zanim wlaczysz komputer, zastanow sie: Czy jestes absolutnie pewien(na), ze nie jest podlaczany do wyrzutni rakiet?
kjonca@o2.pl (Kamil Jońca) writes:
I have some archive mails in gzipped mboxes. I could use them with dovecot 1.x without problems. But recently I have installed dovecot 2.0.12, and they are slow. very slow.
Recently I have to read some compressed mboxes again, and no progress :( I took 2.0.17 sources and put some i_debug ("#kjonca["__FILE__",%d,%s] %d", __LINE__,__func__,...some parameters ...);
lines into istream-bzlib.c, istream-raw-mbox.c and istream-limit.c and found that:
in istream-limit.c in function around lines 40-45: --8<---------------cut here---------------start------------->8--- i_stream_seek(stream->parent, lstream->istream.parent_start_offset + stream->istream.v_offset); stream->pos -= stream->skip; stream->skip = 0; --8<---------------cut here---------------end--------------->8--- seeks stream, (calling i_stream_raw_mbox_seek in file istream-raw-mbox.c )
and then (line 50 ) --8<---------------cut here---------------start------------->8--- if ((ret = i_stream_read(stream->parent)) == -2) return -2; --8<---------------cut here---------------end--------------->8---
tries to read some data earlier in stream, and with compressed mboxes it cause reread file from the beginning.
Then I commented out (just for testing) lines 40-45 from istream-limit.c and bzipped mbox can be opened in reasonable time. (MOreover I can read some randomly picked mails without problems)
Unfortunately, meanig of fields in istream* structures is very unclear for me (especially skip,pos and offset) to write proper code by myself.
KJ
-- http://sporothrix.wordpress.com/2011/01/16/usa-sie-krztusza-kto-nastepny/ Jak ktoś ma pecha, to złamie ząb podczas seksu oralnego (S.Sokół)
On 12/01/2012 10:39, Kamil Jońca wrote:
kjonca@o2.pl (Kamil Jońca) writes:
I have some archive mails in gzipped mboxes. I could use them with dovecot 1.x without problems. But recently I have installed dovecot 2.0.12, and they are slow. very slow.
Recently I have to read some compressed mboxes again, and no progress :( I took 2.0.17 sources and put some i_debug ("#kjonca["__FILE__",%d,%s] %d", __LINE__,__func__,...some parameters ...);
lines into istream-bzlib.c, istream-raw-mbox.c and istream-limit.c and found that:
in istream-limit.c in function around lines 40-45: --8<---------------cut here---------------start------------->8--- i_stream_seek(stream->parent, lstream->istream.parent_start_offset + stream->istream.v_offset); stream->pos -= stream->skip; stream->skip = 0; --8<---------------cut here---------------end--------------->8--- seeks stream, (calling i_stream_raw_mbox_seek in file istream-raw-mbox.c )
and then (line 50 ) --8<---------------cut here---------------start------------->8--- if ((ret = i_stream_read(stream->parent)) == -2) return -2; --8<---------------cut here---------------end--------------->8---
tries to read some data earlier in stream, and with compressed mboxes it cause reread file from the beginning.
Just wanted to bump this since it seems interesting. Timo do you have a comment?
I definitely see your point that skipping backwards in a compressed stream is going to be very CPU intensive.
Ed W
I wonder if this patch helps here: http://hg.dovecot.org/dovecot-2.0/rev/9b2931607063
At least I can't now see any slowness with either v2.1 or the latest v2.0. But I don't know if I would have slowness with older versions either..
participants (4)
-
Ed W
-
kjonca@o2.pl
-
Stan Hoeppner
-
Timo Sirainen