[Dovecot] quick question
Timo (and anyone else who feels like chiming in),
I was just wondering if you'd be able to tell me if the amount of corruption I see on a daily basis is what you consider "average" for our current setup and traffic. Now that we are no longer experiencing any core dumps with the latest patches since our migration from courier two months ago, I'd like to know what is expected as operational norms. Prior to this we had never used Dovecot, so I have nothing to go on.
Our physical setup is 10 Centos 5.4 x86_64 IMAP/POP servers, all with the same NFS backend where the index, control, and Maildir's for the users reside. Accessing this are direct connections from clients, plus multiple squirrelmail webservers, and pine users, all at the same time with layer4 switch connection load balancing.
Each server has an average of about 400 connections, for a total of around concurrent 4000 during a normal business day. This is out of a possible user population of about 15,000.
All our dovecot servers syslog to one machine, and on average I see about 50-75 instances of file corruption per day. I'm not counting each line, since some instances of corruption generate a log message for each uid that's wrong. This is just me counting "user A was corrupted once at 10:00, user B was corrupted at 10:25" for example.
Examples of the corruption are as follows:
########### Corrupted transaction log file ..../dovecot/.INBOX/dovecot.index.log seq 28: Invalid transaction log size (32692 vs 32800): ...../dovecot/.INBOX/dovecot.index.log (sync_offset=32692)
Corrupted index cache file ...../dovecot/.Sent Messages/dovecot.index.cache: Corrupted physical size for uid=624: 0 != 53490263
Corrupted transaction log file ..../dovecot/.INBOX/dovecot.index.log seq 66: Unexpected garbage at EOF (sync_offset=21608)
Corrupted transaction log file ...../dovecot/.Trash.RFA/dovecot.index.log seq 2: indexid changed 1264098644 -> 1264098664 (sync_offset=0)
Corrupted index cache file ...../dovecot/.INBOX/dovecot.index.cache: invalid record size
Corrupted index cache file ...../dovecot/.INBOX/dovecot.index.cache: field index too large (33 >= 19)
Corrupted transaction log file ..../dovecot/.INBOX/dovecot.index.log seq 40: record size too small (type=0x0, offset=5788, size=0) (sync_offset=5812) ##########
These are most of the unique messages I could find, although the majority are the same as the first two I posted. So, my question, is this normal for a setup such as ours? I've been arguing with my boss over this since the switch. My opinion is that with a setup such as ours where a user can be logged in using Thunderbird, Squirrelmail, and their Blackberry all concurrently at the same time, there will always be the occasional index/log corruption.
Unfortunately, he is of the opinion that there should rarely be any and there is a design flaw in how Dovecot is designed to work with multiple services with an NFS backend.
What has been your experience so far?
Thanks, -Dave
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
On Fri, 2010-01-22 at 11:24 -0500, David Halik wrote:
Unfortunately, he is of the opinion that there should rarely be any and there is a design flaw in how Dovecot is designed to work with multiple services with an NFS backend.
Well, he is pretty much correct. I thought I could add enough NFS cache flushes to code to make it work well, but that's highly dependent on what OS or even kernel version the NFS clients are running on. Looking at the problems with people using NFS it's pretty clear that this solution just isn't going to work properly.
But then again, Dovecot is the only (free) IMAP server that even attempts to support this kind of behavior. Or sure, Courier does too, but disabling index files on Dovecot should get the same stability.
I see only two proper solutions:
- Change your architecture so that all mail accesses to a specific user go through a single server. Install Dovecot proxy so all IMAP/POP3 connections go through it to the correct server.
Later once v2.0 is stable install LMTP and make all mail deliveries go through it too (possibly also LMTP proxy if your MTA can't figure out the correct destination server). In the mean time use deliver with a configuration that doesn't update index files.
This guarantees that only a single server ever accesses the user's mails simultaneously. This is the only guaranteed way to make it work in near future. With this setup you should see zero corruption.
- Long term solution will be for Dovecot to not use NFS server for inter-process communication, but instead connect to other Dovecot servers directly via network. Again in this setup there would be only a single server reading/writing user's index files.
On Fri, 2010-01-22 at 19:16 +0200, Timo Sirainen wrote:
- Long term solution will be for Dovecot to not use NFS server for inter-process communication, but instead connect to other Dovecot servers directly via network.
Actually not "NFS server", but "filesystem". So this would be done even when not using NFS.
On Jan 22, 2010, at 1:19 PM, Timo Sirainen wrote:
On Fri, 2010-01-22 at 19:16 +0200, Timo Sirainen wrote:
- Long term solution will be for Dovecot to not use NFS server for inter-process communication, but instead connect to other Dovecot servers directly via network.
Actually not "NFS server", but "filesystem". So this would be done even when not using NFS.
Is this the situation we discussed once where a dovecot instance becomes a proxy if it detects that a user should be on a different server? The one thing I remember sorta missing from that idea at the time was a fallback to local spool if the other dovecot server isnt available.
Cor
On Fri, 2010-01-22 at 13:23 -0400, Cor Bosman wrote:
On Jan 22, 2010, at 1:19 PM, Timo Sirainen wrote:
On Fri, 2010-01-22 at 19:16 +0200, Timo Sirainen wrote:
- Long term solution will be for Dovecot to not use NFS server for inter-process communication, but instead connect to other Dovecot servers directly via network.
Actually not "NFS server", but "filesystem". So this would be done even when not using NFS.
Is this the situation we discussed once where a dovecot instance becomes a proxy if it detects that a user should be on a different server?
No, that was my 1) plan :) And this is already possible with proxy_maybe: http://wiki.dovecot.org/PasswordDatabase/ExtraFields/Proxy
The one thing I remember sorta missing from that idea at the time was a fallback to local spool if the other dovecot server isnt available.
Right. This still isn't supported. Also it's not really the safest solution either, because it could place user's connections to different servers due to some temporary problems. Or if primary has failed, user has connections on secondary server, primary comes back up, now new connections go to primary and old connections haven't been killed from secondary so you'll potentially get corruption.
Better would be to have some kind of a database that externally monitors what servers are up and where users currently have connections, and based on that decide where to redirect a new connection. Although that's also slightly racy unless done carefully.
On Fri, 2010-01-22 at 19:31 +0200, Timo Sirainen wrote:
Is this the situation we discussed once where a dovecot instance becomes a proxy if it detects that a user should be on a different server?
No, that was my 1) plan :) And this is already possible with proxy_maybe: http://wiki.dovecot.org/PasswordDatabase/ExtraFields/Proxy
So, clarification: Either using dedicated proxies or using proxy_maybe works for 1). I just didn't remember proxy_maybe. I suppose that's a better/easier solution since it doesn't require new hardware or network changes.
On Fri, 2010-01-22 at 19:31 +0200, Timo Sirainen wrote:
Better would be to have some kind of a database that externally monitors what servers are up and where users currently have connections, and based on that decide where to redirect a new connection. Although that's also slightly racy unless done carefully.
Wonder if something like this would work:
servers ( id integer, host varchar, ip varchar, last_time_healty timestamp, connection_count integer, new_connections_ok boolean );
user_connections ( user_id integer primary key, server_id integer, last_lookup timestamp, imap_connections integer );
Then some kind of logic that:
if user already exists in user_connections table AND (imap_connections > 0 OR last_lookup>now() - 1 hour) use the old server_id
otherwise figure out a new server for it based on servers' connection_count and new_connections_ok.
when inserting, handle on duplicate key error
when updating, use update user_connections .. where user_id = $userid and server_id = $old_server_id, and be prepared to handle when this returns 0 rows updated.
Once in a while maybe clean up stale rows from user_connections. And properly keeping track of imap_connections count might also be problematic, so maybe once in a while somehow check from all servers if the user actually still has any connections.
One more spam about this :)
On Fri, 2010-01-22 at 19:54 +0200, Timo Sirainen wrote:
Then some kind of logic that:
- if user already exists in user_connections table AND (imap_connections > 0 OR last_lookup>now() - 1 hour) use the old server_id
"AND new_connections_ok" also here. The idea being that something externally monitors servers' health and if it's down for n seconds (n=30 or so?), this field gets updated to FALSE, so new connections for users that were in the broken server go elsewhere.
On 01/22/2010 12:16 PM, Timo Sirainen wrote:
Looking at the problems with people using NFS it's pretty clear that this solution just isn't going to work properly.
Actually, considering the amount of people and servers we're throwing at it, I think that it's dealing with it pretty well. I'm sure there are always more tweaks and enhancements that can be done, but look at how much better 1.2 is over 1.0 releases. it's definitely not "broken," just maybe not quite production ready as it could be. Honestly, at this point my users are very happy with the speed increase and as long as their imap process isn't dying they don't seem to notice the behind the scenes corruption because of the self healing code.
But then again, Dovecot is the only (free) IMAP server that even attempts to support this kind of behavior. Or sure, Courier does too, but disabling index files on Dovecot should get the same stability.
By the way, I didn't want to give the impression that we were unhappy with the product, rather I think what you've accomplished with dovecot is great even by non-free enterprise standards, not to mention the level of support you've given us has been excellent and I appreciate it greatly. It was a clear choice for us over courier once NFS support became a reality. Loads on the exact same hardware dropped from an average of 5 to 0.5, quite amazing, not to mention the speed benefit of the indexes. Our users with extremely large Maildir's were very satisfied.
I see only two proper solutions:
- Change your architecture so that all mail accesses to a specific user go through a single server. Install Dovecot proxy so all IMAP/POP3 connections go through it to the correct server.
We've discussed this internally and are still considering layer7 username balancing as a possibility, but I haven't worked too much on the specifics yet. We've only been running for two months on dovecot, so we wanted to give it some burn in time and see how things progressed. Now that the core dumps are fixed, I think we might be able to live with the corruption for awhile. The only user visible issue that I was aware of was the the users' mailbox disappearing when the processes died, but since that's not happening any more I'll have to see if anyone notices the corruption.
Thanks for all the feedback. I'm going over some of the ideas you suggested and we'll be thinking about long term solutions.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
David,
-----Original Message----- From: dovecot-bounces+brandond=uoregon.edu@dovecot.org [mailto:dovecot- Our physical setup is 10 Centos 5.4 x86_64 IMAP/POP servers, all with the same NFS backend where the index, control, and Maildir's for the users reside. Accessing this are direct connections from clients, plus multiple squirrelmail webservers, and pine users, all at the same time with layer4 switch connection load balancing.
Each server has an average of about 400 connections, for a total of around concurrent 4000 during a normal business day. This is out of a possible user population of about 15,000.
All our dovecot servers syslog to one machine, and on average I see about 50-75 instances of file corruption per day. I'm not counting each line, since some instances of corruption generate a log message for each uid that's wrong. This is just me counting "user A was corrupted once at 10:00, user B was corrupted at 10:25" for example.
We have a much similar setup - 8 POP/IMAP servers running RHEL 5.4, Dovecot 1.2.9 (+ patches), F5 BigIP load balancer cluster (active/standby) in a L4 profile distributing connections round-robin, maildirs on two Netapp Filers (clustered 3070s with 54k RPM SATA disks), 10k peak concurrent connections for 45k total accounts. We used to run with the noac mount option, but performance was abysmal, and we were approaching 80% CPU utilization on the filers at peak load. After removing noac, our CPU is down around 30%, and our NFS ops/sec rate is maybe 1/10th of what it used to be.
The downside to this is that we've started seeing significantly more crashing and mailbox corruption. Timo's latest patch seems to have fixed the crashing, but the corruption just seems to be the cost of distributing users at random across our backend servers.
We've thought about enabling IP-based session affinity on the load balancer, but this would concentrate the load of our webmail clients, as well as not really solving the problem for users that leave clients open on multiple systems. I've done a small bit of looking at nginx's imap proxy support, but it's not really set up to do what we want, and would require moving the IMAP virtual server off our load balancers and on to something significantly less supportable. Having the dovecot processes 'talk amongst themselves' to synchronize things, or go into proxy mode automatically, would be fantastic.
Anyway, that's where we're at with the issue. As a data point for your discussion with your boss:
- With 'noac', we would see maybe 1 or two 'corrupt' errors a day. Most of these were related to users going over quota.
- After removing 'noac', we saw 5-10 'Corrupt' errors and 20-30 crashes a day. The crashes were highly visible to the users, as their mailbox would appear to be empty until the rebuild completed.
- Since applying the latest patch, we've seen no crashes, and 60-70 'Corrupt' errors a day. We have not had any new user complaints.
Hope that helps,
-Brad
On 01/22/2010 01:15 PM, Brandon Davidson wrote:
We have a much similar setup - 8 POP/IMAP servers running RHEL 5.4, Dovecot 1.2.9 (+ patches), F5 BigIP load balancer cluster (active/standby) in a L4 profile distributing connections round-robin, maildirs on two Netapp Filers (clustered 3070s with 54k RPM SATA disks), 10k peak concurrent connections for 45k total accounts. We used to run with the noac mount option, but performance was abysmal, and we were approaching 80% CPU utilization on the filers at peak load. After removing noac, our CPU is down around 30%, and our NFS ops/sec rate is maybe 1/10th of what it used to be.
Wow, that's almost the exact same setup we use, except we have 10 IMAP/POP and a clustered pair of FAS920's with 10K drives which are getting replaced in a few weeks. We also have a pair of clustered 3050's, but they're not running dovecot (yet).
You're right about noac though, it absolutely destroyed our netapps. Of course the corruption was all but eliminated, but the filer performance was so bad our users immediately noticed. Definitely not an option.
The downside to this is that we've started seeing significantly more crashing and mailbox corruption. Timo's latest patch seems to have fixed the crashing, but the corruption just seems to be the cost of distributing users at random across our backend servers.
Yep, I agree. Like I said in the last email, we'll going to deal with it for now and see if anyone really notices. I can live with it if the users don't care.
Timo, speaking of which, I'm guessing everyone is happy with the latest patches, any ETA on 1.2.10? ;)
We've thought about enabling IP-based session affinity on the load balancer, but this would concentrate the load of our webmail clients, as well as not really solving the problem for users that leave clients open on multiple systems.
We currently have IP session 'sticky' on our L4's and it didn't help all that much. yes, it reduces thrashing on the backend, but ultimately it won't help the corruption. Like you said, multiple logins will still go to different servers when the IP's are different.
How if your webmail architecture setup? We're using imapproxy to spread them them out across the same load balancer, so essentially all traffic from outside and inside get's balanced. The trick is we have an internal load balanced virtual IP that spreads the load out for webmail on private IP space. If they were to go outside they would get NAT'd as one outbound IP, so we just go inside and get the benefit of balancing.
Anyway, that's where we're at with the issue. As a data point for your discussion with your boss:
- With 'noac', we would see maybe 1 or two 'corrupt' errors a day. Most of these were related to users going over quota.
- After removing 'noac', we saw 5-10 'Corrupt' errors and 20-30 crashes a day. The crashes were highly visible to the users, as their mailbox would appear to be empty until the rebuild completed.
- Since applying the latest patch, we've seen no crashes, and 60-70 'Corrupt' errors a day. We have not had any new user complaints.
That's where we are, and as long as the corruptions stay user invisible, I'm fine with it. Crashes seem to be the only user visible issue so far, with "noac" being out of the question unless they buy a ridiculously expensive filer.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
Wow, that's almost the exact same setup we use, except we have 10 IMAP/POP and a clustered pair of FAS920's with 10K drives which are getting replaced in a few weeks. We also have a pair of clustered 3050's, but they're not running dovecot (yet).
Pretty much the same as us as well. 35 imap servers. 10 pop servers. clustered pair of 6080s, with about 250 15K disks. We're seeing some corruption as well. I myself am using imap extensively and regularly have problems with my inbox disappearing. Im not running the patch yet though. Is 1.2.10 imminent or should i just patch 1.2.9?
Cor
Cor,
On 1/22/10 1:05 PM, "Cor Bosman" <cor@xs4all.nl> wrote:
Pretty much the same as us as well. 35 imap servers. 10 pop servers. clustered pair of 6080s, with about 250 15K disks. We're seeing some corruption as well. I myself am using imap extensively and regularly have problems with my inbox disappearing. Im not running the patch yet though. Is 1.2.10 imminent or should i just patch 1.2.9?
You guys must serve a pretty heavy load. What's your peak connection count across all those machines? How's the load? We recently went through a hardware replacement cycle, and were targeting < 25% utilization at peak load so we can lose one of our sites (half of our machines are in each site) without running into any capacity problems. We're actually at closer to 10% at peak, if that... Probably less now that we've disabled noac. Dovecot is fantastic :)
-Brad
You guys must serve a pretty heavy load. What's your peak connection count across all those machines? How's the load? We recently went through a hardware replacement cycle, and were targeting < 25% utilization at peak load so we can lose one of our sites (half of our machines are in each site) without running into any capacity problems. We're actually at closer to 10% at peak, if that... Probably less now that we've disabled noac. Dovecot is fantastic :)
I think the peak is around 10000 concurrent connections, out of about 500,000 mailboxes. The servers are way overspecced, so we can lose half of them. The netapps are also being used for webservices.
Cor
David,
On 1/22/10 12:34 PM, "David Halik" <dhalik@jla.rutgers.edu> wrote:
We currently have IP session 'sticky' on our L4's and it didn't help all that much. yes, it reduces thrashing on the backend, but ultimately it won't help the corruption. Like you said, multiple logins will still go to different servers when the IP's are different.
How if your webmail architecture setup? We're using imapproxy to spread them them out across the same load balancer, so essentially all traffic from outside and inside get's balanced. The trick is we have an internal load balanced virtual IP that spreads the load out for webmail on private IP space. If they were to go outside they would get NAT'd as one outbound IP, so we just go inside and get the benefit of balancing.
We have two webmail interfaces - one is an old in-house open-source project called Alphamail, the new one is Roundcube. Both of them point at the same VIP that we point users at, with no special rules. We're running straight round-robin L4 connection distribution, with no least-connections or sticky-client rules.
We've been running this way for about 3 years I think.. I've only been here a year. We made a number of changes in sequence starting about three and a half years ago - Linux NFS to Netapp, Courier to Dovecot, mbox to Maildir+, LVS to F5 BigIP; not necessarily in that order. At no point have we ever had any sort of session affinity.
That's where we are, and as long as the corruptions stay user invisible, I'm fine with it. Crashes seem to be the only user visible issue so far, with "noac" being out of the question unless they buy a ridiculously expensive filer.
Yeah, as long as the users don't see it, I'm happy to live with the messages in the log file.
-Brad
On 01/22/2010 05:14 PM, Brandon Davidson wrote:
Yeah, as long as the users don't see it, I'm happy to live with the messages in the log file.
-Brad
*sigh*, it looks like there still might be the occasional user visible issue. I was hoping that once the assert stopped happening, and the process stayed alive, that the users wouldn't see their inbox disappear and reappear.... apparently, this is still happening occasionally.
I just had user experience this with TB 2, and after looking at the logs I found the good ole' stale nfs message:
Jan 25 11:39:24 gehenna21 dovecot: IMAP(user): fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle
Fortunately, there were no other messages associated with it (assert or otherwise), but I was hoping to have seen the last of the users mail momentarily reloading.
For now they're just have to live with it until I either get proxy_maybe setup, or some other solution.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
On 2010-01-25 12:57 PM, David Halik wrote:
I just had user experience this with TB 2, and after looking at the logs I found the good ole' stale nfs message:
Maybe TB3 would be better behaved? It has many, many MAP improvements over TB2... worth a try at least...
--
Best regards,
Charles
On 01/25/2010 01:00 PM, Charles Marcus wrote:
On 2010-01-25 12:57 PM, David Halik wrote:
I just had user experience this with TB 2, and after looking at the logs I found the good ole' stale nfs message:
Maybe TB3 would be better behaved? It has many, many MAP improvements over TB2... worth a try at least...
I agree, I definitely want the user to try it... especially since they're technically inclined and can tell me one way or the other. I'm going to wait though until 3.0.3 comes out because of the CONDSTORE issues.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
On 01/25/2010 01:02 PM, David Halik wrote:
On 01/25/2010 01:00 PM, Charles Marcus wrote:
On 2010-01-25 12:57 PM, David Halik wrote:
I just had user experience this with TB 2, and after looking at the logs I found the good ole' stale nfs message: Maybe TB3 would be better behaved? It has many, many MAP improvements over TB2... worth a try at least...
I agree, I definitely want the user to try it... especially since they're technically inclined and can tell me one way or the other. I'm going to wait though until 3.0.3 comes out because of the CONDSTORE issues.
Err, 3.0.2 rather. Speaking of which, I just was notified that the patch for approved for inclusion in 3.0.2. Now it just depends on how long it takes to be released.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
David,
-----Original Message----- From: David Halik [mailto:dhalik@jla.rutgers.edu]
*sigh*, it looks like there still might be the occasional user visible issue. I was hoping that once the assert stopped happening, and the process stayed alive, that the users wouldn't see their inbox disappear and reappear.... apparently, this is still happening occasionally.
I just had user experience this with TB 2, and after looking at the logs I found the good ole' stale nfs message:
Hmm, that's disappointing to hear. I haven't received any new reports from our helpdesk, so maybe it's at least less visible?
For now they're just have to live with it until I either get proxy_maybe setup, or some other solution.
Let me know if you come up with anything. I'm not sure we want to add MySQL as a dependency for our mail service... but I'm at least curious to see how things perform with session affinity. I'll add it to my long list of things to play with when I have time for such things...
-Brad
On Mon, 2010-01-25 at 12:57 -0500, David Halik wrote:
Jan 25 11:39:24 gehenna21 dovecot: IMAP(user): fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle
Well, two possibilities:
a) The attached patch fixes this
b) Dotlocking isn't working for you..
On 01/25/2010 01:31 PM, Timo Sirainen wrote:
On Mon, 2010-01-25 at 12:57 -0500, David Halik wrote:
Jan 25 11:39:24 gehenna21 dovecot: IMAP(user): fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle
Well, two possibilities:
a) The attached patch fixes this
b) Dotlocking isn't working for you..
We're using this on NFSv3:
lock_method: fcntl dotlock_use_excl: yes fsync_disable: no
I counted and we see between 15-25 of these a day, so I would think that if dotlocking wasn't working it would be more? Don't know. Shouldn't it be using fcntl? I read on the wiki it still uses dotlocking occasionally though, so I guess you use both. fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle
Rarely I also see one or two of these: fdatasync() failed with index cache file /rci/nqu/rci/u1/user/dovecot/.INBOX/dovecot.index.cache: Stale NFS file handle
...but I'm guess the Stale is the same in each case, just a different symptom.
Just upgraded all the servers to 1.2.10, so I'll patch and report back. Might as well do it now. ;) I have one user that is good at letting me know when oddness happens and has had this a few times since they're logged in from multiple locations for mail.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
On 01/25/2010 02:18 PM, David Halik wrote:
On 01/25/2010 01:31 PM, Timo Sirainen wrote:
On Mon, 2010-01-25 at 12:57 -0500, David Halik wrote:
Jan 25 11:39:24 gehenna21 dovecot: IMAP(user): fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Well, two possibilities:
a) The attached patch fixes this
I patched and immediately starting seeing *many* of these:
Jan 25 15:05:33 gehenna18.rutgers.edu dovecot: IMAP(user): lseek(/rci/nqu/rci/u1/sendick/dovecot/.Trash/dovecot-uidlist) failed: Bad file descriptor Jan 25 15:05:33 gehenna18.rutgers.edu dovecot: IMAP(user): lseek(/rci/nqu/rci/u1/sendick/dovecot/.Trash/dovecot-uidlist) failed: Bad file descriptor
...so I backed out right away.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
On Mon, 2010-01-25 at 15:12 -0500, David Halik wrote:
I patched and immediately starting seeing *many* of these:
Jan 25 15:05:33 gehenna18.rutgers.edu dovecot: IMAP(user): lseek(/rci/nqu/rci/u1/sendick/dovecot/.Trash/dovecot-uidlist) failed: Bad file descriptor
Hmm. I put it through a few seconds of imaptest but didn't see these, so I guess there's something it didn't catch. The attached patch fixes the first obvious potential problem I can think of, try if you still dare. :)
On 01/25/2010 03:26 PM, Timo Sirainen wrote:
On Mon, 2010-01-25 at 15:12 -0500, David Halik wrote:
I patched and immediately starting seeing *many* of these:
Jan 25 15:05:33 gehenna18.rutgers.edu dovecot: IMAP(user): lseek(/rci/nqu/rci/u1/sendick/dovecot/.Trash/dovecot-uidlist) failed: Bad file descriptor
Hmm. I put it through a few seconds of imaptest but didn't see these, so I guess there's something it didn't catch. The attached patch fixes the first obvious potential problem I can think of, try if you still dare. :)
No guts no glory! So far, so good. The first patch started spewing messages within seconds. I've been running for about twenty minutes with this version and I haven't seen much of anything yet.
I'll report back tomorrow after it has a day to burn in.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
No guts no glory! So far, so good. The first patch started spewing messages within seconds. I've been running for about twenty minutes with this version and I haven't seen much of anything yet.
I'll report back tomorrow after it has a day to burn in.
It's still a bit buggy. I haven't seen any messages in the last few hours, but then a user just dumped a gigantic 200MB core. Looking at dump it's because of some recursive loop that goes on forever:
#0 0x00002b656f2cba71 in _int_malloc (av=0x2b656f5ab9e0, bytes=368) at malloc.c:4650 iters = <value optimized out> nb = 384 idx = 759448916 bin = <value optimized out> victim = <value optimized out> size = <value optimized out> victim_index = <value optimized out> remainder = <value optimized out> remainder_size = <value optimized out> block = <value optimized out> bit = <value optimized out> map = <value optimized out> fwd = <value optimized out> bck = <value optimized out> #1 0x00002b656f2cd86d in __libc_calloc (n=<value optimized out>, elem_size=<value optimized out>) at malloc.c:4006 av = (struct malloc_state *) 0x2b656f5ab9e0 oldtop = (struct malloc_chunk *) 0x1da94070 p = <value optimized out> bytes = 368 csz = <value optimized out> oldtopsize = 12176 mem = (void *) 0x139cdc40 clearsize = <value optimized out> nclears = <value optimized out> d = <value optimized out> #2 0x00000000004a8ea6 in pool_system_malloc (pool=<value optimized out>, size=368) at mempool-system.c:78 mem = <value optimized out> #3 0x00000000004a4daa in i_stream_create_fd (fd=12, max_buffer_size=4096, autoclose_fd=96) at istream-file.c:156 fstream = <value optimized out> st = {st_dev = 329008600, st_ino = 4452761, st_nlink = 27, st_mode = 799030, st_uid = 0, st_gid = 1, pad0 = 0, st_rdev = 109556025819520, st_size = 11013, st_blksize = 0, st_blocks = 95, st_atim = { tv_sec = 4096, tv_nsec = 8}, st_mtim = {tv_sec = 1264465811, tv_nsec = 499376000}, st_ctim = {tv_sec = 1264465811, tv_nsec = 499378000}, __unused = {1264465811, 499384000, 0}} #4 0x000000000043fba6 in maildir_uidlist_refresh (uidlist=0x139d6ab0) at maildir-uidlist.c:733 retry = 64 ret = -1 #5 0x0000000000440bb5 in maildir_uidlist_update_hdr (uidlist=0x2b656f5ab9e0, st=0x7fffc949d360) at maildir-uidlist.c:382 mhdr = (struct maildir_index_header *) 0x139cdc40 #6 0x000000000043ffff in maildir_uidlist_refresh (uidlist=0x139d6ab0) at maildir-uidlist.c:793 retry = false ret = 1 #7 0x0000000000440bb5 in maildir_uidlist_update_hdr (uidlist=0x2b656f5ab9e0, st=0x7fffc949d4b0) at maildir-uidlist.c:382 mhdr = (struct maildir_index_header *) 0x139cdc40 #8 0x000000000043ffff in maildir_uidlist_refresh (uidlist=0x139d6ab0) at maildir-uidlist.c:793 retry = false ret = 1 #9 0x0000000000440bb5 in maildir_uidlist_update_hdr (uidlist=0x2b656f5ab9e0, st=0x7fffc949d600) at maildir-uidlist.c:382 mhdr = (struct maildir_index_header *) 0x139cdc40 #10 0x000000000043ffff in maildir_uidlist_refresh (uidlist=0x139d6ab0) at maildir-uidlist.c:793 retry = false ret = 1 #11 0x0000000000440bb5 in maildir_uidlist_update_hdr (uidlist=0x2b656f5ab9e0, st=0x7fffc949d750) at maildir-uidlist.c:382 mhdr = (struct maildir_index_header *) 0x139cdc40
...and on and on for thousands of lines. I gave up after 20K. ;)
On Mon, 2010-01-25 at 20:28 -0500, David Halik wrote:
#5 0x0000000000440bb5 in maildir_uidlist_update_hdr (uidlist=0x2b656f5ab9e0, st=0x7fffc949d360) at maildir-uidlist.c:382 mhdr = (struct maildir_index_header *) 0x139cdc40 #6 0x000000000043ffff in maildir_uidlist_refresh (uidlist=0x139d6ab0) at maildir-uidlist.c:793 retry = false ret = 1 #7 0x0000000000440bb5 in maildir_uidlist_update_hdr (uidlist=0x2b656f5ab9e0, st=0x7fffc949d4b0) at maildir-uidlist.c:382
Oh, interesting. An infinite loop. Looks like this could have happened ever since v1.1. Wonder why it hasn't shown up before. Anyway, fixed: http://hg.dovecot.org/dovecot-1.2/rev/a9710cb350c0
On 2/6/2010 2:06 PM, Timo Sirainen wrote:
ab9e0, st=0x7fffc949d4b0) at maildir-uidlist.c:382
Oh, interesting. An infinite loop. Looks like this could have happened ever since v1.1. Wonder why it hasn't shown up before. Anyway, fixed: http://hg.dovecot.org/dovecot-1.2/rev/a9710cb350c0
Do you think I should try the previous patch with this addition? I never got a chance to test it for long because of the loop dump.
On Sat, 2010-02-06 at 14:28 -0500, David Halik wrote:
On 2/6/2010 2:06 PM, Timo Sirainen wrote:
ab9e0, st=0x7fffc949d4b0) at maildir-uidlist.c:382
Oh, interesting. An infinite loop. Looks like this could have happened ever since v1.1. Wonder why it hasn't shown up before. Anyway, fixed: http://hg.dovecot.org/dovecot-1.2/rev/a9710cb350c0
Do you think I should try the previous patch with this addition? I never got a chance to test it for long because of the loop dump.
I committed that patch already to hg, so please do test it :)
On 02/06/2010 02:32 PM, Timo Sirainen wrote:
On Sat, 2010-02-06 at 14:28 -0500, David Halik wrote:
On 2/6/2010 2:06 PM, Timo Sirainen wrote:
ab9e0, st=0x7fffc949d4b0) at maildir-uidlist.c:382
Oh, interesting. An infinite loop. Looks like this could have happened ever since v1.1. Wonder why it hasn't shown up before. Anyway, fixed: http://hg.dovecot.org/dovecot-1.2/rev/a9710cb350c0
Do you think I should try the previous patch with this addition? I never got a chance to test it for long because of the loop dump.
I committed that patch already to hg, so please do test it :)
I've been running both patches and so far they're stable with no new crashes, but I haven't really seen any "better" behavior, so I don't know if it's accomplishing anything. =)
Still seeing entire uidlist list dupes after the list goes stale. I think that was what we were originally discussing.
Feb 8 12:55:06 gehenna11.rutgers.edu dovecot: IMAP(user): fdatasync(/rci/nqu/rci/u5/user/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Feb 8 12:55:20 gehenna11.rutgers.edu dovecot: IMAP(user): /rci/nqu/rci/u5/user/dovecot/.INBOX/dovecot-uidlist: next_uid was lowered (40605 -> 40604, hdr=40604) Feb 8 13:03:51 gehenna11.rutgers.edu dovecot: IMAP(user): /rci/nqu/rci/u5/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 4: 1251801090.M721811P3983V04240006I01A1DAF9_0.gehenna7.rutgers.edu,S=3001:2,S (uid 35314 -> 40606) Feb 8 13:03:51 gehenna11.rutgers.edu dovecot: IMAP(user): /rci/nqu/rci/u5/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 5: 1251810220.M816183P3757V04240006I01A1DB04_0.gehenna7.rutgers.edu,S=4899:2,S (uid 35315 -> 40607) Feb 8 13:03:51 gehenna11.rutgers.edu dovecot: IMAP(user): /rci/nqu/rci/u5/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 6: 1251810579.M402527P753V045C0007I01A1DB05_0.gehenna8.rutgers.edu,S=36471:2,RS (uid 35316 -> 40608)
.... and so on until the end of the list.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
Hi David,
-----Original Message----- From: David Halik
I've been running both patches and so far they're stable with no new crashes, but I haven't really seen any "better" behavior, so I don't know if it's accomplishing anything. =)
Still seeing entire uidlist list dupes after the list goes stale. I think that was what we were originally discussing.
I wasn't able to roll the patched packages into production until this morning, but so far I'm seeing the same thing as you - no real change in behavior.
I guess that brings us back to Timo's possibility number two?
-Brad
On 02/08/2010 01:46 PM, Brandon Davidson wrote:
Hi David,
-----Original Message----- From: David Halik
I've been running both patches and so far they're stable with no new crashes, but I haven't really seen any "better" behavior, so I don't know if it's accomplishing anything. =)
Still seeing entire uidlist list dupes after the list goes stale. I think that was what we were originally discussing.
I wasn't able to roll the patched packages into production until this morning, but so far I'm seeing the same thing as you - no real change in behavior.
I guess that brings us back to Timo's possibility number two?
-Brad
It looks like we're still working towards a layer 7 solution anyway. Right now we have one of our student programmers hacking Perdition with a new plugin for dynamic username caching, storage, and automatic fail over. If we get it working I can send you the basics if you're interested.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
Hi David,
-----Original Message----- From: David Halik
It looks like we're still working towards a layer 7 solution anyway. Right now we have one of our student programmers hacking Perdition with a new plugin for dynamic username caching, storage, and automatic fail over. If we get it working I can send you the basics if you're interested.
I'd definitely be glad in taking a look at what you come up with! I'm still leaning towards MySQL with quick local fallback, but I'm nowhere near committed to anything.
On a side note, we've been running with the two latest maildir patches in production for a few days now. The last few days we've been seeing a lot of lock failures:
Feb 10 04:06:02 cc-popmap6p dovecot: imap-login: Login: user=<pellerin>, method=PLAIN, rip=67.223.67.45, lip=128.223.142.39, TLS, mailpid=12881 Feb 10 04:08:03 oh-popmap3p dovecot: imap-login: Login: user=<pellerin>, method=PLAIN, rip=67.223.67.45, lip=128.223.142.39, TLS, mailpid=9569 Feb 10 04:09:02 cc-popmap6p dovecot: imap: user=<pellerin>, rip=67.223.67.45, pid=12881: Timeout while waiting for lock for transaction log file /home6/pellerin/.imapidx/.INBOX/dovecot.index.log Feb 10 04:09:02 cc-popmap6p dovecot: imap: user=<pellerin>, rip=67.223.67.45, pid=12881: Our dotlock file /home6/pellerin/Maildir/dovecot-uidlist.lock was modified (1265803562 vs 1265803684), assuming it wa Feb 10 04:09:02 cc-popmap6p dovecot: imap: user=<pellerin>, rip=67.223.67.45, pid=12881: Connection closed bytes=31/772 Feb 10 04:11:04 oh-popmap3p dovecot: imap: user=<pellerin>, rip=67.223.67.45, pid=9569: Timeout while waiting for lock for transaction log file /home6/pellerin/.imapidx/.INBOX/dovecot.index.log Feb 10 04:11:04 oh-popmap3p dovecot: imap: user=<pellerin>, rip=67.223.67.45, pid=9569: Our dotlock file /home6/pellerin/Maildir/dovecot-uidlist.lock was deleted (locked 180 secs ago, touched 180 secs ago) Feb 10 04:11:04 oh-popmap3p dovecot: imap: user=<pellerin>, rip=67.223.67.45, pid=9569: Connection closed bytes=18/465
I'm not sure if this is just because it's trying more diligently to make sure it's got the latest info, and is therefore hitting locks where it didn't previously... but it's been hanging our clients and requiring manual intervention to clear. We've been removing the lock file and killing any active dovecot sessions, which seems to resolve things for a while.
Just thought I'd see if this was happening to anyone else.
-Brad
On 02/10/2010 06:15 PM, Brandon Davidson wrote:
Hi David,
-----Original Message----- From: David Halik
It looks like we're still working towards a layer 7 solution anyway. Right now we have one of our student programmers hacking Perdition
with
a new plugin for dynamic username caching, storage, and automatic fail over. If we get it working I can send you the basics if you're
interested.
I'd definitely be glad in taking a look at what you come up with! I'm still leaning towards MySQL with quick local fallback, but I'm nowhere near committed to anything.
We're in the process of doing some beta work on it, but so far it works nicely. It's basically a plugin for perdition that dynamically builds a username db with least connections to a pool of servers, then always sends the user back to the same machine. There's a tool you can run to see who is on what machine and what the overall layout looks like. If the perdition server goes down, we have our switch send people to a backup perdition and it dynamically recreates the db again. We have to do some more testing (and actually make it live), but so far it's promising.
On a side note, we've been running with the two latest maildir patches in production for a few days now. The last few days we've been seeing a lot of lock failures:
Just thought I'd see if this was happening to anyone else.
I haven't been seeing this here. As far as I can tell, there has been no noticeable change in either direction with the last two patches. Every once in a blue moon I'll find a dead lock file somewhere, but it doesn't seem to be a recurring issue.
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
I think mail is the wrong application for nfs, because nfs is slow for metadata operations. Would rather use it for vm hosting than mail.
We used to have a small clustered netapp with 10k hdds and three frontend servers with postfix and courier imap/pop3. the setup was stable however the performance was not good.
So we build a imap cluster out of a pair of dell r710 (6 x 15K hdds) with centos 5 drbd and heartbeat. I will scale this setup by adding another pair of r710 servers and randomizing the mailboxes between the imap/pop3 cluster pairs. An imap proxy will direct the users to the right server and the frontend mx servers will also send the mail to the right server by using smtp as tranport and postfix transport maps for routing.
In the future i would like to switch from courier to dovecot and using lmtp as transport to our mailstore.
We currently have 10000 mailboxes only 300 - 400 imap connections but a lot pop access
the load an the active r710 is only 0.10 :)
I think mail is a problem which you can easily partition so why have all eggs in one basket :)
alex
We've thought about enabling IP-based session affinity on the load balancer,
Brandon, I just thought of something. Have you always been running without IP affinity across all your connections? We've always had it turned on because we were under the impression that certain clients like Outlook had major issues without it. Basically, as the client spawns new connections and they go to other servers rather than the same one the client begins to fight itself. IP affinity always seemed like a more stable option, but if you've been running without it for a long time, maybe it's not such a problem after all. Anyway, what has you experience been?
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu
On 01/22/2010 10:15 AM, Brandon Davidson wrote:
We've thought about enabling IP-based session affinity on the load balancer, but this would concentrate the load of our webmail clients, as well as not really solving the problem for users that leave clients open on multiple systems.
Webmail and IMAP servers are on the same network for us so we don't have to go through the BigIP for this, we just use local round-robin DNS to avoid any sort of clumping. Imapproxy or dovecot proxy local to the webmail server would get around that too.
I've done a small bit of looking at nginx's imap proxy support, but it's not really set up to do what we want, and would require moving the IMAP virtual server off our load balancers and on to something significantly less supportable. Having the dovecot processes 'talk amongst themselves' to synchronize things, or go into proxy mode automatically, would be fantastic.
Though we aren't using NFS we do have a BigIP directing IMAP and POP3 traffic to multiple dovecot stores. We use mysql authentication and the "proxy_maybe" option to keep users on the correct box. My tests using an external proxy box didn't significantly reduce the load on the stores compared to proxy_maybe. And you don't have to manage another box/config. Since you only need to keep users on the _same_ box and not the _correct_ box, if you're using mysql authentication you could hash the username or domain to a particular IP address:
SELECT CONCAT('192.168.1.', ORD(UPPER(SUBSTRING('%d', 1, 1))) AS host, 'Y' AS proxy_maybe, ...
Just assign IP addresses 192.168.1.48-90 to your dovecot servers. Shift the range by adding or subtracting to the ORD. A mysql function would likely work just as well. If a server goes down, move it's IP. You could probably make pairs with heartbeat or some monitoring software to do it automatically.
-David
David,
Though we aren't using NFS we do have a BigIP directing IMAP and POP3 traffic to multiple dovecot stores. We use mysql authentication and the "proxy_maybe" option to keep users on the correct box. My tests using an external proxy box didn't significantly reduce the load on the stores compared to proxy_maybe. And you don't have to manage another box/config. Since you only need to keep users on the _same_ box and not the _correct_ box, if you're using mysql authentication you could hash the username or domain to a particular IP address:
SELECT CONCAT('192.168.1.', ORD(UPPER(SUBSTRING('%d', 1, 1))) AS host, 'Y' AS proxy_maybe, ...
Just assign IP addresses 192.168.1.48-90 to your dovecot servers. Shift the range by adding or subtracting to the ORD. A mysql function would likely work just as well. If a server goes down, move it's IP. You could probably make pairs with heartbeat or some monitoring software to do it automatically.
Timo posted a similar suggestion recently, and I might try to find some time to proof this out over the next few weeks. I liked his idea of storing the user's current server in the database and proxying to that, with fallback to a local connection if they're new or their current server is unavailable. The table cleanup and pool monitoring would probably be what I'd worry most about testing.
Unfortunately we're currently using LDAP auth via PAM... so even if I could get the SQL and monitoring issues resolved, I think I'd have a hard time convincing my peers that adding a SQL server as a single point of failure was a good idea. If it could be set up to just fall back to using a local connection in the event of a SQL server outage, that might help things a bit. Anyone know how that might work?
-Brad
On 25.1.2010, at 21.30, Brandon Davidson wrote:
Unfortunately we're currently using LDAP auth via PAM... so even if I could get the SQL and monitoring issues resolved, I think I'd have a hard time convincing my peers that adding a SQL server as a single point of failure was a good idea. If it could be set up to just fall back to using a local connection in the event of a SQL server outage, that might help things a bit. Anyone know how that might work?
Well, you can always fall back to LDAP if SQL isn't working.. Just something like:
passdb sql { .. } passdb ldap { .. }
Timo,
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi]
On 25.1.2010, at 21.30, Brandon Davidson wrote:
If it could be set up to just fall back to using a local connection in the event of a SQL server outage, that might help things a bit. Anyone know how that might work?
Well, you can always fall back to LDAP if SQL isn't working.. Just something like:
passdb sql { .. } passdb ldap { .. }
Or just 'passdb pam { ... }' for the second one in our case, since we're using system auth with pam_ldap/nss_ldap. Is the SQL connection/query timeout configurable? It would be nice to make a very cursory attempt at proxying, and immediately give up and use a local connection if anything isn't working.
-Brad
On 25.1.2010, at 21.53, Brandon Davidson wrote:
Or just 'passdb pam { ... }' for the second one in our case, since we're using system auth with pam_ldap/nss_ldap. Is the SQL connection/query timeout configurable? It would be nice to make a very cursory attempt at proxying, and immediately give up and use a local connection if anything isn't working.
I don't think it's immediate.. But it's probably something like:
- notice it's not working -> reconnect
- requests are queued
- reconnect fails, hopefully soon, but MySQL connect at least fails in max. 10 seconds
- reconnect timeout is added, which doubles after each failure
- requests are failed while it's not trying to connect
Timo,
On 1/25/10 12:31 PM, "Timo Sirainen" <tss@iki.fi> wrote:
I don't think it's immediate.. But it's probably something like:
- notice it's not working -> reconnect
- requests are queued
- reconnect fails, hopefully soon, but MySQL connect at least fails in max. 10 seconds
- reconnect timeout is added, which doubles after each failure
- requests are failed while it's not trying to connect
Hmm, that's not great. Is that tunable at all? Cursory examination shows that it's hardcoded in src/lib-sql/driver-mysql.c, so I guess not.
I suppose I could also get around to playing with multi-master replication so I at least have a SQL server available at each of the sites that I have Dovecot servers...
-Brad
participants (7)
-
alex handle
-
Brandon Davidson
-
Charles Marcus
-
Cor Bosman
-
David Halik
-
David Jonas
-
Timo Sirainen