[Dovecot] testing needed: log file concurrency
http://dovecot.org/tmp/concurrency.c
I'd want to know what results this program gives with different systems. Please test and reply (but don't bother if someone already replied with the same OS+result). I expect it to print:
- SMP kernels: "page size cut" once in a while
- UP (uniprocessor) kernels: Nothing
- The most important thing is that it never prints "broken data"
It might take a while for it to print anything. With my computer it takes anything from a few seconds to a minute or so. See the file itself for compiling/running instructions.
So far I've tested only with Linux 2.6.21 x86-64/SMP and a slow Solaris/Sparc/UP.
If you're interested in knowing what this is about:
Dovecot writes to dovecot.index.log files by first writing the transaction with its size being 0. After that it writes the 4 size bytes again (using a bit special format with all bytes ORed with 0x80).
I expected that when another process is read()ing the file and it notices the size being valid (all bytes having 0x80) that the whole transaction could always be read. But looks like if the size happens to be just before a memory page boundary, it's possible that the updated size is read, but the rest of the transaction isn't.
If there is no output, whats the longest you want us to wait while it runs? How much do you care about unique OS/arch/cpu/fs combinations (what factors shouldn't matter)? I assume you want just one reader and one writer, started in the order listed in the source?
I currently have it running on SMP(quad) FreeBSD 6.2-STABLE amd64 Sat Jun 16 2007 on both ufs2 and nfs to our NetApp, and UP FreeBSD 6.2-STABLE i386 Tue Jan 16 2007 on ufs2, maybe some more systems after I send this mail, and a little later I can try it on UP FreeBSD 7 with zfs. The ones I already have running for a few minutes have printed nothing so far.
On Wed, Jun 20, 2007 at 02:41:42AM +0300, Timo Sirainen wrote:
http://dovecot.org/tmp/concurrency.c
I'd want to know what results this program gives with different systems. Please test and reply (but don't bother if someone already replied with the same OS+result). I expect it to print:
- SMP kernels: "page size cut" once in a while
- UP (uniprocessor) kernels: Nothing
- The most important thing is that it never prints "broken data"
It might take a while for it to print anything. With my computer it takes anything from a few seconds to a minute or so. See the file itself for compiling/running instructions.
So far I've tested only with Linux 2.6.21 x86-64/SMP and a slow Solaris/Sparc/UP.
If you're interested in knowing what this is about:
Dovecot writes to dovecot.index.log files by first writing the transaction with its size being 0. After that it writes the 4 size bytes again (using a bit special format with all bytes ORed with 0x80).
I expected that when another process is read()ing the file and it notices the size being valid (all bytes having 0x80) that the whole transaction could always be read. But looks like if the size happens to be just before a memory page boundary, it's possible that the updated size is read, but the rest of the transaction isn't.
On Tue, 2007-06-19 at 20:16 -0400, Adam McDougall wrote:
If there is no output, whats the longest you want us to wait while it runs?
I think if it hasn't printed anything for 15 minutes it's pretty safe to assume it's not going to print anything.
How much do you care about unique OS/arch/cpu/fs combinations (what factors shouldn't matter)? I assume you want just one reader and one writer, started in the order listed in the source?
I don't think filesystem or CPU would matter, but I'm not a kernel coder so I'm not sure. I think the main difference is what kernel is being used.
The reader and writer processes probably need to be running in different CPUs to get the problem, so you could start 3 readers just to be sure.
The ones I already have running for a few minutes have printed nothing so far.
I guess it's possible that FreeBSD handles this the way I originally expected Linux to handle it.
On Wed, Jun 20, 2007 at 03:31:21AM +0300, Timo Sirainen wrote:
On Tue, 2007-06-19 at 20:16 -0400, Adam McDougall wrote:
If there is no output, whats the longest you want us to wait while it runs?
I think if it hasn't printed anything for 15 minutes it's pretty safe to assume it's not going to print anything.
Okay. Some I stopped after half an hour, some I had running until now (hours later). I tested on a couple FreeBSD systems with 6.2-stablesomething or 7-current, no messages at all. I tested on a 6-way 750mhz Ultrasparc in solaris 9, running one writer and 6 readers, no messages at all.
How much do you care about unique OS/arch/cpu/fs combinations (what factors shouldn't matter)? I assume you want just one reader and one writer, started in the order listed in the source?
I don't think filesystem or CPU would matter, but I'm not a kernel coder so I'm not sure. I think the main difference is what kernel is being used.
The reader and writer processes probably need to be running in different CPUs to get the problem, so you could start 3 readers just to be sure.
I made sure on one system each of 6.2, 7.0, and solaris 9 I ran one reader for each core.
The ones I already have running for a few minutes have printed nothing so far.
I guess it's possible that FreeBSD handles this the way I originally expected Linux to handle it.
NetBSD 3.1 (GENERIC)
No output, as expected after about 15min running.
Daniel.
Timo Sirainen wrote:
http://dovecot.org/tmp/concurrency.c
I'd want to know what results this program gives with different systems. Please test and reply (but don't bother if someone already replied with the same OS+result). I expect it to print:
- SMP kernels: "page size cut" once in a while
- UP (uniprocessor) kernels: Nothing
- The most important thing is that it never prints "broken data"
It might take a while for it to print anything. With my computer it takes anything from a few seconds to a minute or so. See the file itself for compiling/running instructions.
So far I've tested only with Linux 2.6.21 x86-64/SMP and a slow Solaris/Sparc/UP.
If you're interested in knowing what this is about:
Dovecot writes to dovecot.index.log files by first writing the transaction with its size being 0. After that it writes the 4 size bytes again (using a bit special format with all bytes ORed with 0x80).
I expected that when another process is read()ing the file and it notices the size being valid (all bytes having 0x80) that the whole transaction could always be read. But looks like if the size happens to be just before a memory page boundary, it's possible that the updated size is read, but the rest of the transaction isn't.
On Wed, Jun 20, 2007 at 02:41:42AM +0300, Timo Sirainen wrote:
http://dovecot.org/tmp/concurrency.c
I'd want to know what results this program gives with different systems. Please test and reply (but don't bother if someone already replied with the same OS+result). I expect it to print:
- SMP kernels: "page size cut" once in a while
- UP (uniprocessor) kernels: Nothing
- The most important thing is that it never prints "broken data"
It might take a while for it to print anything. With my computer it takes anything from a few seconds to a minute or so. See the file itself for compiling/running instructions.
Hi,
hope it isn't too unexpected but it prints nothing for me neither in the UP nor SMP case, just:
./concurrency writing, page size = 4096
./concurrency 1 reading, page size = 4096
UP is a stable setup with 'FreeBSD 6.2-RELEASE-p1 i386' (32bit) and SMP a rather adventurous setup with 'FreeBSD 7.0-CURRENT amd64' (64bit). I ran the code for more than 10 minutes on each machine.
-- Sascha
Hi Timo,
It prints the following:
./concurrency writing, page size = 8192
./concurrency 1 reading, page size = 8192
This is a:
SunOS 5.10 Generic_118833-33 sun4u sparc SUNW,Sun-Fire-V240
psrinfo -v Status of virtual processor 0 as of: 06/20/2007 21:00:33 on-line since 04/05/2007 16:11:51. The sparcv9 processor operates at 1503 MHz, and has a sparcv9 floating point processor. Status of virtual processor 1 as of: 06/20/2007 21:00:33 on-line since 04/05/2007 16:11:35. The sparcv9 processor operates at 1503 MHz, and has a sparcv9 floating point processor.
Hope this helps.
Cheers.
On Wed, Jun 20, 2007 at 02:41:42AM +0300, Timo Sirainen wrote:
http://dovecot.org/tmp/concurrency.c
I'd want to know what results this program gives with different systems. Please test and reply (but don't bother if someone already replied with the same OS+result). I expect it to print:
- SMP kernels: "page size cut" once in a while
- UP (uniprocessor) kernels: Nothing
- The most important thing is that it never prints "broken data"
It might take a while for it to print anything. With my computer it takes anything from a few seconds to a minute or so. See the file itself for compiling/running instructions.
Timo Sirainen said the following on 20/6/2007 1:41:
AMD Athlon(tm) 64 Processor 2800+ 2.6.21-1.3228.fc7 (Fedora 7) Noting.
Intel(R) Pentium(R) 4 CPU 3.06GHz w/HyperThreading 2.6.9-42.0.3.ELsmp (Red Hat) page size cut after a couple of minutes
Ciao, luigi
-- / +--[Luigi Rosa]-- \
(1) Everything depends. (2) Nothing is always. (3) Everything is sometimes.
Hi Timo,
Timo Sirainen wrote:
http://dovecot.org/tmp/concurrency.c
I'd want to know what results this program gives with different systems. Please test and reply (but don't bother if someone already replied with the same OS+result). I expect it to print:
- SMP kernels: "page size cut" once in a while
- UP (uniprocessor) kernels: Nothing
- The most important thing is that it never prints "broken data"
Linux 2.6.20-16-server (SMP), Core 2 Duo E6600: Only one reader: No output Three readers: After a few minutes "Page size cut" printed by one of them
Linux 2.6.18-028stab027 (UP), Athlon 64 3700+: Never outputs anything.
Regards, Philipp
Hi Timo,
[cut]
So far I've tested only with Linux 2.6.21 x86-64/SMP and a slow Solaris/Sparc/UP.
One writer and three readers ran for 30 minutes on Solaris 10 without printing anything. The box is an UltraSparc IIIi dual proc, and the FS on the partition is UFS.
Greg
Quoting Timo Sirainen:
http://dovecot.org/tmp/concurrency.c
I'd want to know what results this program gives with different systems. Please test and reply (but don't bother if someone already replied with the same OS+result). I expect it to print:
Dual Pentium III 1133MHz: one "page size cut" every few minutes Dual Xeon 2.40GHz (with HT -> 4 virtual CPUs): "page size cut" immediatly and constantly.
Both on Linux 2.6.13.
On Wed, 2007-06-20 at 02:41 +0300, Timo Sirainen wrote:
http://dovecot.org/tmp/concurrency.c
I'd want to know what results this program gives with different systems. Please test and reply (but don't bother if someone already replied with the same OS+result). I expect it to print:
- SMP kernels: "page size cut" once in a while
Hmm. This happens even if file is being locked. So I think this is a Linux bug..
Timo Sirainen wrote:
On Wed, 2007-06-20 at 02:41 +0300, Timo Sirainen wrote:
Works on dual processor running latest Fedora 7:
net1#uname -a
Linux net1.coolsurf.com 2.6.21-1.3228.fc7 #1 SMP \
Tue Jun 12 14:56:37 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
-- Like feeling your best ever, all day, every day? Here's how... Your simple secrets are here - http://RadicalHealth.com
OS: CentOS 4.5 (Final) (RHEL4 clone)
$ /usr/bin/time -f "total time: %E\ni/o waits: %w\n" ./concurrency writing, page size = 4096 Command terminated by signal 2 total time: 10:41.53 i/o waits: 312177
$ /usr/bin/time -f "total time: %E\ni/o waits: %w\n" ./concurrency 1 reading, page size = 4096 page size cut Command terminated by signal 2 total time: 10:39.67 i/o waits: 314930
Machine:
$ uname -srvmpoi Linux 2.6.9-42.0.10.ELsmp #1 SMP Tue Feb 27 10:11:19 EST 2007 i686 i686 i386 GNU/Linux
$ cat /proc/cpuinfo | egrep "(processor|model name)" processor : 0 model name : Intel(R) Xeon(R) CPU 5120 @ 1.86GHz processor : 1 model name : Intel(R) Xeon(R) CPU 5120 @ 1.86GHz processor : 2 model name : Intel(R) Xeon(R) CPU 5120 @ 1.86GHz processor : 3 model name : Intel(R) Xeon(R) CPU 5120 @ 1.86GHz
Timo Sirainen wrote:
http://dovecot.org/tmp/concurrency.c
I'd want to know what results this program gives with different systems. Please test and reply (but don't bother if someone already replied with the same OS+result). I expect it to print:
- SMP kernels: "page size cut" once in a while
- UP (uniprocessor) kernels: Nothing
- The most important thing is that it never prints "broken data"
It might take a while for it to print anything. With my computer it takes anything from a few seconds to a minute or so. See the file itself for compiling/running instructions.
So far I've tested only with Linux 2.6.21 x86-64/SMP and a slow Solaris/Sparc/UP.
If you're interested in knowing what this is about:
Dovecot writes to dovecot.index.log files by first writing the transaction with its size being 0. After that it writes the 4 size bytes again (using a bit special format with all bytes ORed with 0x80).
I expected that when another process is read()ing the file and it notices the size being valid (all bytes having 0x80) that the whole transaction could always be read. But looks like if the size happens to be just before a memory page boundary, it's possible that the updated size is read, but the rest of the transaction isn't.
-- Troy Engel | Systems Engineer Fluid, Inc | http://www.fluid.com
On Wed, 2007-06-20 at 02:41 +0300, Timo Sirainen wrote:
You can forget about this for now. There was one bug in it and with a couple of changes I can't break it in my own system either anymore. Wonder why I'm seeing similar problems in Dovecot v1.1 code..
On Wed, 2007-06-20 at 20:11 +0300, Timo Sirainen wrote:
On Wed, 2007-06-20 at 02:41 +0300, Timo Sirainen wrote:
You can forget about this for now. There was one bug in it and with a couple of changes I can't break it in my own system either anymore. Wonder why I'm seeing similar problems in Dovecot v1.1 code..
Looks like my brain processed the initial debugging results wrong and this whole test was unneeded.
Fixed Dovecot: http://hg.dovecot.org/dovecot-1.0/rev/98cd45935799
Thanks anyway for testing, it showed that OSes don't return broken data which is good.
If you still want to test, you can run the attached concurrency.c. It shouldn't print any errors with any OS.
Am Mittwoch, 20. Juni 2007 schrieb Timo Sirainen:
Mh...
"19:51:35 FEHLER 404: Not Found."
I just wanted to test it using Linux 2.6.21 on my Core2 Duo T7200 running Debian Unstable...
Does anyone still have the file?
Greetings,
Gunter
-- *** Powered by AudioScrobbler --> http://www.last.fm/user/Interneci/ *** 18:40 | Xandria - Vampire 18:36 | Xandria - Save My Life 23:14 | Regicide - Perfect strings 23:05 | Regicide - An Embracing Space Part III: Eclipsing Lights *** PGP-Verschlüsselung bei eMails erwünscht :-) *** PGP: 0x1128F25F ***
Am Mittwoch, 20. Juni 2007 schrieb Gunter Ohrner:
Am Mittwoch, 20. Juni 2007 schrieb Timo Sirainen:
http://dovecot.org/tmp/concurrency.c Does anyone still have the file?
Ok, there was a race between Thimos and my mails... ;)
Greetings,
Gunter
-- *** Powered by AudioScrobbler --> http://www.last.fm/user/Interneci/ *** 20:16 | Xandria - A New Age 20:11 | Xandria - Firestorm 20:08 | Xandria - Only for the Stars In Your Eyes 20:02 | Xandria - Salomé *** PGP-Verschlüsselung bei eMails erwünscht :-) *** PGP: 0x1128F25F ***
participants (13)
-
Adam McDougall
-
Daniel Cox
-
David Favor
-
greg@kamago.net
-
Gunter Ohrner
-
Jakob Hirsch
-
Johannes Berg
-
Luigi Rosa
-
Philipp Wollermann
-
Sascha Holzleiter
-
Tan Shao Yi
-
Timo Sirainen
-
Troy Engel