[Dovecot] severe performance problem (mail cache related?)
Hi all, we're having a bad day with email :)
I have a user who was complaining of poor performance today when opening mailboxes. Total time to open the box was about 1 minute. Upon truss'ing the imap process during an open, I saw the trace quoted below at the end of the email.
As you can see, for each "FETCH" line of response, we were seeing dovecot do a ton of work-- mmaping anon, lots of munmaps(), etc. -- mmap() and munmap() are not free.
The stack trace of at least one of the munmaps is captured below, as well-- deep inside of the mail caching code.
Then, I moved aside the user's .imap directory, assuming that forcing dovecot to rebuild its indices and caches could help-- and sure enough the user reported that mailbox open times had dropped to 18 seconds. I truss'd that open, briefly, and it was just a ton of preads at high speed.
So I don't know why this user was fine for a week, then suddenly hit this. This is a grave concern if I'm going to roll this out from 15 test users to 250 production users. Thanks for your help.
PS: Raw logs are also attached below.
-dp
write(1, " * 1 0 7 9 5 F E T C".., 2050) = 2050 mmap64(0x00000000, 7356416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFD800000 munmap(0xFE602000, 1048576) = 0 munmap(0xFE502000, 1048576) = 0 munmap(0xFE402000, 1048576) = 0 munmap(0xFE302000, 1048576) = 0 munmap(0xFE202000, 1048576) = 0 munmap(0xFE102000, 1048576) = 0 munmap(0xFE002000, 1048576) = 0 munmap(0xFE000000, 8192) = 0 pread64(9, " n c o m @ o s s 1 >\n\0".., 8192, 7340032) = 8192 mmap64(0x00000000, 7364608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFE800000 munmap(0xFDE04000, 1048576) = 0 munmap(0xFDD04000, 1048576) = 0 munmap(0xFDC04000, 1048576) = 0 munmap(0xFDB04000, 1048576) = 0 munmap(0xFDA04000, 1048576) = 0 munmap(0xFD904000, 1048576) = 0 munmap(0xFD804000, 1048576) = 0 munmap(0xFD802000, 8192) = 0 munmap(0xFD800000, 8192) = 0 pread64(9, " l @ p l a t i n u m >\n".., 8192, 7348224) = 8192 mmap64(0x00000000, 7372800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFE000000 munmap(0xFEE06000, 1048576) = 0 munmap(0xFED06000, 1048576) = 0 munmap(0xFEC06000, 1048576) = 0 munmap(0xFEB06000, 1048576) = 0 munmap(0xFEA06000, 1048576) = 0 munmap(0xFE906000, 1048576) = 0 munmap(0xFE806000, 1048576) = 0 munmap(0xFE802000, 16384) = 0 munmap(0xFE800000, 8192) = 0 pread64(9, "\0\0\0 0\0\0\0\0 C c : ".., 8192, 7356416) = 8192
ff1c3e14 munmap (100000, bdc000, fe6de000, 2dc000, 0, 1ef0) + 8 00080bb4 file_cache_set_size (c03a0, 0, bdc110, 408, 0, 110) + e0 00080cec file_cache_read (0, bdbd08, bdbd08, 408, 0, 684a8) + a4 000594a0 mail_cache_map (191268, bdbd08, 408, ffbff274, 3, fffffffc) + ec 0005ae20 mail_cache_get_record (0, bdbd08, ffbff274, ffbff274, 0, 0) + 20 0005b00c mail_cache_foreach_rec (1959c8, ffbff2ec, 5b3d8, 0, b5760, 7ed8) + 10 0005b2bc mail_cache_foreach (1, 47ae, 5b3d8, 0, 0, 0) + b4 0005b47c mail_cache_field_exists (0, 47ae, 0, c0000000, b4f10, 88184) + 84 0005b64c mail_cache_lookup_field (1959c8, b5898, 47ae, 0, 191d48, 195560) + c 000531ac index_mail_get_fixed_field (195dd8, 0, 195ea0, 4, ffffffff, ffffffff) + 4c 000545e8 index_mail_set_seq (195dd8, 47ae, 19a1c8, b4b4f, 0, 47ae) + bc 00070dac mail_set_seq (195dd8, 47ae, 1, c2ce8, 0, 39c) + c 0005770c index_storage_search_next (1, 195dd8, 1, bd778, 5769c, c8ae0) + 70 00072024 mailbox_search_next (19a1c8, 195dd8, 3, 2, 0, c2b50) + 14 0002b3bc imap_fetch (c4e20, c4de8, ffbff748, 5, c2eb0, 80000000) + 6c 00026210 cmd_fetch (1, c2e88, 0, 75696400, 26054, 80000000) + 1bc 00029000 cmd_uid (0, c22f8, 0, 8, 2a134, c22f8) + 8c 00029b58 client_handle_input (c2b94, c2e80, b5760, bd778, 7ef8, 2000000) + 138 00029ad0 client_handle_input (0, 2f05e860, b5760, bd778, 45d3834d, b5000) + b0 00029cb0 _client_input (c2b50, c2b50, 26ee, 0, 1, 0) + 68 00086338 io_loop_handler_run (c0348, 0, 0, ffbffa14, 4c, 80000000) + 140 00085c14 io_loop_run (c0348, ff212cb0, 1, b573c, bae80, ff215dbc) + 34 00032064 main (ffbffc18, b4400, b5000, b573c, c1314, ff3a0180) + 3f8 00024b18 _start (0, 0, 0, 0, 0, 0) + 5c
[in]
3 select "Mail/OpenSolaris/discuss" 4 UID fetch 1:* (FLAGS) 5 IDLE DONE 6 close 7 logout
[out, taking 1+ minutes]
- OK [RAWLOG TIMESTAMP] 2007-02-14 14:05:35
- FLAGS (\Answered \Flagged \Deleted \Seen \Draft NonJunk $Forwarded Junk)
- OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft NonJunk $Forwarded Junk \*)] Flags permitted.
- 20462 EXISTS
- 0 RECENT
- OK [UIDVALIDITY 1159980013] UIDs valid
- OK [UIDNEXT 20487] Predicted next UID 3 OK [READ-WRITE] Select completed.
- 1 FETCH (FLAGS (\Seen) UID 1)
- 2 FETCH (FLAGS (\Seen) UID 2)
- 3 FETCH (FLAGS (\Seen) UID 3)
- 4 FETCH (FLAGS (\Seen) UID 4)
- 5 FETCH (FLAGS (\Seen) UID 5)
- 6 FETCH (FLAGS (\Seen) UID 6)
- 7 FETCH (FLAGS (\Seen) UID 7)
- 8 FETCH (FLAGS (\Seen) UID 8)
- 9 FETCH (FLAGS (\Seen) UID 9)
- 10 FETCH (FLAGS (\Seen) UID 10)
- 11 FETCH (FLAGS (\Seen) UID 11)
- 12 FETCH (FLAGS (\Seen) UID 12)
- 13 FETCH (FLAGS (\Seen) UID 13)
- 14 FETCH (FLAGS (\Seen) UID 14)
- 15 FETCH (FLAGS (\Seen) UID 15)
- 16 FETCH (FLAGS (\Seen) UID 16)
- 17 FETCH (FLAGS (\Seen) UID 17)
- 18 FETCH (FLAGS (\Seen) UID 18)
- 19 FETCH (FLAGS (\Seen) UID 19)
- 20 FETCH (FLAGS (\Seen) UID 20)
- 21 FETCH (FLAGS (\Seen) UID 21)
- 22 FETCH (FLAGS (\Seen) UID 22)
- 23 FETCH (FLAGS (\Seen) UID 23) [-----elided-----]
- 20459 FETCH (FLAGS (\Seen) UID 20483)
- 20460 FETCH (FLAGS (\Seen) UID 20484)
- 20461 FETCH (FLAGS (\Seen) UID 20485)
- 20462 FETCH (FLAGS (\Seen) UID 20486) 4 OK Fetch completed.
- idling 5 OK Idle completed. 6 OK Close completed.
- BYE Logging out 7 OK Logout completed.
-- Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp
On Wed 14 Feb 2007 at 02:24PM, Dan Price wrote:
write(1, " * 1 0 7 9 5 F E T C".., 2050) = 2050 mmap64(0x00000000, 7356416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFD800000 munmap(0xFE602000, 1048576) = 0 munmap(0xFE502000, 1048576) = 0 munmap(0xFE402000, 1048576) = 0 munmap(0xFE302000, 1048576) = 0 munmap(0xFE202000, 1048576) = 0 munmap(0xFE102000, 1048576) = 0 munmap(0xFE002000, 1048576) = 0 munmap(0xFE000000, 8192) = 0
I was sad to see no response about this issue. I'm now seeing this on my *own* mailbox which is disturbing. Here is an example:
pread64(9, " p o s a l s\n\0\0\0\013".., 8192, 7241728) = 8192 pread64(9, " < S t e v e n . L u c".., 8192, 7249920) = 8192 pread64(9, "\0\0\01F\0\0\0\0 T o : ".., 8192, 7258112) = 8192 mmap64(0x00000000, 13058048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFE400000 munmap(0xFDF72000, 1048576) = 0 munmap(0xFDE72000, 1048576) = 0 munmap(0xFDD72000, 1048576) = 0 munmap(0xFDC72000, 1048576) = 0 munmap(0xFDB72000, 1048576) = 0 munmap(0xFDA72000, 1048576) = 0 munmap(0xFD972000, 1048576) = 0 munmap(0xFD872000, 1048576) = 0 munmap(0xFD772000, 1048576) = 0 munmap(0xFD672000, 1048576) = 0 munmap(0xFD572000, 1048576) = 0 munmap(0xFD472000, 1048576) = 0 munmap(0xFD402000, 458752) = 0 munmap(0xFD400000, 8192) = 0
So here we see dovecot allocate a whopping 12MB of anonymous memory, then munmap it all away in 1MB chunks.
Timo, do you know why this is happening? It is ruining performance for me.
-dp
-- Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp
On Fri 16 Feb 2007 at 12:20AM, Dan Price wrote:
So here we see dovecot allocate a whopping 12MB of anonymous memory, then munmap it all away in 1MB chunks.
Timo, do you know why this is happening? It is ruining performance for me.
More data: watching another user suffer from this and watching just the mmap's, I see:
vvvvvvvv
mmap64(0x00000000, 16973824, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON... mmap64(0x00000000, 16982016, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 16990208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 16998400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... .... mmap64(0x00000000, 20602880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 20611072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 20619264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... ... mmap64(0x00000000, 25067520, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON... ...
So this sure looks serious-- it's almost like a memory leak.
Using DTrace I was able to pinpoint the stack track of the mmap system call:
$ dtrace -n 'syscall::mmap64:entry/pid==164794/{ustack()}'
12 52690 mmap64:entry
libc.so.1mmap64+0xc imap
mremap_anon+0xb8
imapfile_cache_set_size+0xe0 imap
file_cache_read+0xa4
imapmail_cache_map+0xec imap
mail_cache_get_record+0x20
imapmail_cache_foreach_rec+0x10 imap
mail_cache_foreach+0xb4
imapmail_cache_field_exists+0x84 imap
mail_cache_lookup_field+0xc
imapindex_mail_get_fixed_field+0x4c imap
index_mail_set_seq+0xbc
imapmail_set_seq+0xc imap
index_storage_search_next+0x70
imapmailbox_search_next+0x14 imap
cmd_search+0xf4
imapclient_handle_input+0x138 imap
client_handle_input+0xb0
imap_client_input+0x68 imap
io_loop_handler_run+0x140
I also have a complete trace of the flow of control between the return from one call to mmap and the start of the next, but it's 165K, so I have not attached it.
The user reported to me that pine was "doing a selection" or "doing a search" when this happened.
-dp
-- Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp
On Fri, 2007-02-16 at 01:06 -0800, Dan Price wrote:
More data: watching another user suffer from this and watching just the mmap's, I see:
vvvvvvvv
mmap64(0x00000000, 16973824, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON... mmap64(0x00000000, 16982016, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 16990208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 16998400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... .... mmap64(0x00000000, 20602880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 20611072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 20619264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... ... mmap64(0x00000000, 25067520, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON... ...
So this sure looks serious-- it's almost like a memory leak.
How large is the dovecot.index.cache file for that user? The anonymous mmaps are created so that Dovecot can "mmap" the dovecot.index.cache file into memory without actually using mmap().
The munmapping happens because Dovecot wants to grow the mmaped area, and since Solaris doesn't have mremap() call it fakes it by doing another larger mmap() and moving the data in 1MB blocks while unmapping the old memory at the same time.
So.. Hmm. Could the problem simply be that the mmap-copy-growing is too slow? If the user really has some 25MB cache file, that could be it. I think I could change the code so that it mmap()s immediately enough memory to fit the whole cache file. Probably a good idea to do anyway.
On Fri 16 Feb 2007 at 12:41PM, Timo Sirainen wrote:
On Fri, 2007-02-16 at 01:06 -0800, Dan Price wrote:
More data: watching another user suffer from this and watching just the mmap's, I see:
vvvvvvvv
mmap64(0x00000000, 16973824, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON... mmap64(0x00000000, 16982016, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 16990208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 16998400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... .... mmap64(0x00000000, 20602880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 20611072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... mmap64(0x00000000, 20619264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,... ... mmap64(0x00000000, 25067520, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON... ...
So this sure looks serious-- it's almost like a memory leak.
How large is the dovecot.index.cache file for that user? The anonymous mmaps are created so that Dovecot can "mmap" the dovecot.index.cache file into memory without actually using mmap().
The munmapping happens because Dovecot wants to grow the mmaped area, and since Solaris doesn't have mremap() call it fakes it by doing another larger mmap() and moving the data in 1MB blocks while unmapping the old memory at the same time.
So.. Hmm. Could the problem simply be that the mmap-copy-growing is too slow? If the user really has some 25MB cache file, that could be it. I think I could change the code so that it mmap()s immediately enough memory to fit the whole cache file. Probably a good idea to do anyway.
Cool-- of course, I missed that we were growing at pagesize (8K).
So that sounds plausible. I can file an RFE for mremap, as well-- frankly this is the first time I've heard of it. The user has a large mailbox: 600+MB. Here is the contents of their .imap/INBOX directory:
drwx------ 2 comay staff 6 Feb 16 01:00 ./ drwx------ 3 comay staff 3 Feb 15 12:03 ../ -rw------- 1 comay staff 2.0M Feb 16 01:00 dovecot.index -rw------- 1 comay staff 48M Feb 16 00:16 dovecot.index.cache -rw------- 1 comay staff 844 Feb 16 01:00 dovecot.index.log -rw------- 1 comay staff 434K Feb 16 00:27 dovecot.index.log.2
I agree that mmaping the whole index.cache in one shot is the way to go-- asking the VM system to do work in little chunks is (at least on Solaris) a real performance killer.
-dp
-- Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp
On Fri, 2007-02-16 at 02:49 -0800, Dan Price wrote:
So.. Hmm. Could the problem simply be that the mmap-copy-growing is too slow? If the user really has some 25MB cache file, that could be it. I think I could change the code so that it mmap()s immediately enough memory to fit the whole cache file. Probably a good idea to do anyway.
Cool-- of course, I missed that we were growing at pagesize (8K).
Yes, I didn't actually realize that before just now. That's horribly slow. I'll have to make it grow the memory in larger chunks in any case..
This should anyway help with the initial mapping:
http://dovecot.org/list/dovecot-cvs/2007-February/007695.html http://dovecot.org/list/dovecot-cvs/2007-February/007697.html
On Fri 16 Feb 2007 at 02:49AM, Dan Price wrote:
So.. Hmm. Could the problem simply be that the mmap-copy-growing is too slow? If the user really has some 25MB cache file, that could be it. I think I could change the code so that it mmap()s immediately enough memory to fit the whole cache file. Probably a good idea to do anyway.
One thing I don't totally get is why you mmap anon here, if this is really about a cache file? It seems that you could mmap the cache file directly, perhaps with MAP_PRIVATE if you need to make in-memory-only changes.
I agree that mmaping the whole index.cache in one shot is the way to go-- asking the VM system to do work in little chunks is (at least on Solaris) a real performance killer.
Another idea I had was that dovecot could take advantage of larger pages--especially for its anon mappings-- this cuts TLB footprint and is usually "a good thing."
So for example, on Solaris (SPARC) you can get 8K, 64K, 512K and 4M pages. On Solaris AMD64 it's 4K and 2M.
This is available via getpagesizes(3c).
For now, I'm going to try to change file_cache_set_size() to just artificially inflate page_size up to 4M-- at least that should make these resizes less frequent.
Thanks again for your help-- the fact that you take the time to support the community is a major factor in my decision to deploy dovecot.
-dp
-- Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp
On Fri, 2007-02-16 at 03:05 -0800, Dan Price wrote:
On Fri 16 Feb 2007 at 02:49AM, Dan Price wrote:
So.. Hmm. Could the problem simply be that the mmap-copy-growing is too slow? If the user really has some 25MB cache file, that could be it. I think I could change the code so that it mmap()s immediately enough memory to fit the whole cache file. Probably a good idea to do anyway.
One thing I don't totally get is why you mmap anon here, if this is really about a cache file? It seems that you could mmap the cache file directly, perhaps with MAP_PRIVATE if you need to make in-memory-only changes.
Yes, and it's done as long as you don't have mmap_disable=yes. Hmm. Weren't you using ZFS directly? Why are you using mmap_disable=yes? :) Its main purpose is to make indexes work in NFS.
I agree that mmaping the whole index.cache in one shot is the way to go-- asking the VM system to do work in little chunks is (at least on Solaris) a real performance killer.
Another idea I had was that dovecot could take advantage of larger pages--especially for its anon mappings-- this cuts TLB footprint and is usually "a good thing."
So for example, on Solaris (SPARC) you can get 8K, 64K, 512K and 4M pages. On Solaris AMD64 it's 4K and 2M.
This is available via getpagesizes(3c).
I haven't heard about this before. I'll see if I can use it.
On Fri 16 Feb 2007 at 01:12PM, Timo Sirainen wrote:
On Fri, 2007-02-16 at 03:05 -0800, Dan Price wrote:
On Fri 16 Feb 2007 at 02:49AM, Dan Price wrote:
So.. Hmm. Could the problem simply be that the mmap-copy-growing is too slow? If the user really has some 25MB cache file, that could be it. I think I could change the code so that it mmap()s immediately enough memory to fit the whole cache file. Probably a good idea to do anyway.
One thing I don't totally get is why you mmap anon here, if this is really about a cache file? It seems that you could mmap the cache file directly, perhaps with MAP_PRIVATE if you need to make in-memory-only changes.
Yes, and it's done as long as you don't have mmap_disable=yes. Hmm. Weren't you using ZFS directly? Why are you using mmap_disable=yes? :) Its main purpose is to make indexes work in NFS.
Well there you go, I'm a dope. I think I set mmap_disable=yes because I was seeing strange things happening where there would be thousands (millions?) of madvise(DONTNEED) calls-- seemingly forever in a loop. I have not yet tracked that problem down-- next time I see it, I will make sure to get to the bottom of it.
I must have decided that turning off mmap would also halt the advice... who knows, it has been a rough week. There's a lesson: know what the hell you are doing before changing settings :)
I will run tomorrow with mmap_disable=no and see how it goes.
-dp
-- Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp
On Fri, 2007-02-16 at 03:29 -0800, Dan Price wrote:
Yes, and it's done as long as you don't have mmap_disable=yes. Hmm. Weren't you using ZFS directly? Why are you using mmap_disable=yes? :) Its main purpose is to make indexes work in NFS.
Well there you go, I'm a dope. I think I set mmap_disable=yes because I was seeing strange things happening where there would be thousands (millions?) of madvise(DONTNEED) calls-- seemingly forever in a loop. I have not yet tracked that problem down-- next time I see it, I will make sure to get to the bottom of it.
Actually the only place where madvise(MADV_DONTNEED) is called is in the file-cache code which is used only with mmap_disable=yes.
There are madvise(MADV_SEQUENTIAL) calls with mmap_disable=no though.
On Fri, 16 Feb 2007, Timo Sirainen wrote:
Date: Fri, 16 Feb 2007 13:39:33 +0200 From: Timo Sirainen <tss@iki.fi> To: Dan Price <dp@eng.sun.com> Cc: dovecot@dovecot.org Subject: Re: [Dovecot] severe performance problem (mail cache related?)
On Fri, 2007-02-16 at 03:29 -0800, Dan Price wrote:
Yes, and it's done as long as you don't have mmap_disable=yes. Hmm. Weren't you using ZFS directly? Why are you using mmap_disable=yes? :) Its main purpose is to make indexes work in NFS.
Well there you go, I'm a dope. I think I set mmap_disable=yes because I was seeing strange things happening where there would be thousands (millions?) of madvise(DONTNEED) calls-- seemingly forever in a loop. I have not yet tracked that problem down-- next time I see it, I will make sure to get to the bottom of it.
Actually the only place where madvise(MADV_DONTNEED) is called is in the file-cache code which is used only with mmap_disable=yes.
There are madvise(MADV_SEQUENTIAL) calls with mmap_disable=no though.
Dan,
Could you give us other Sun users a few details of your setup, so we
can avoid this hole? I am running rc22 on a T2000, Solaris 10 11/06 plus (mostly) current patches, ZFS version 3 for my user homedirs (where cache files and folders go), INBOXes NFS mounted from another Sun S10 box (NFS version 4). I have never touched any of the mmap related settings in dovecot.conf. The whole thing works beautifully for 3000 users. The T2000 never breaks a sweat. The users are happy.
Jeff Earickson Colby College
On Fri 16 Feb 2007 at 08:24AM, Jeff A. Earickson wrote:
There are madvise(MADV_SEQUENTIAL) calls with mmap_disable=no though.
Dan,
Could you give us other Sun users a few details of your setup, so we can avoid this hole? I am running rc22 on a T2000, Solaris 10 11/06 plus (mostly) current patches, ZFS version 3 for my user homedirs (where cache files and folders go), INBOXes NFS mounted from another Sun S10 box (NFS version 4). I have never touched any of the mmap related settings in dovecot.conf. The whole thing works beautifully for 3000 users. The T2000 never breaks a sweat. The users are happy.
That's good news!
Well the docs say that mmap_disable=yes is important if you are using NFS-- in which case, you'll want the patches Timo just integrated into CVS, if users have large mail cache files, otherwise you'll hit the performance bug I did.
Timo-- why is mmap_disable so important for NFS?
I have been thinking that dovecot should auto-sense the filesystem the various files live on, and adapt its behavior-- (for example, in our environment I've added a "don't bother to fsync on ZFS" patch). Many environments are mixes of NFS and other FS's, as in Jeff's case, so being able to auto-sense and customize behavior would probably be a big win.
-dp
-- Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp
On 17.2.2007, at 1.01, Dan Price wrote:
Timo-- why is mmap_disable so important for NFS?
Because another computer may have modified the file on the server,
and if I just access the mmap()ed memory area it doesn't notice the
changes. Also another reason is that the process may die with SIGBUS
in case the file was deleted from the server.
I have been thinking that dovecot should auto-sense the filesystem the various files live on, and adapt its behavior-- (for example, in our environment I've added a "don't bother to fsync on ZFS" patch). Many environments are mixes of NFS and other FS's, as in Jeff's case, so being able to auto-sense and customize behavior would probably be a big win.
There is actually a NFS check nowadays, and if it notices that you're
using NFS but mmap_disable=no, it logs an error:
i_fatal("Mailbox indexes in %s are in NFS mount. "
"You must set mmap_disable=yes to avoid index corruptions. "
"If you're sure this check was wrong, set nfs_check=no.", path);
This is done only for the first user that logs in after startup. I'm
not sure if it's a good idea to automatically change any settings
based on the detection.
And I don't think the fsync behavior is all that different with ZFS
than with other FSes :)
participants (3)
-
Dan Price
-
Jeff A. Earickson
-
Timo Sirainen