[Dovecot] [ Re: best practises for mail systems]
Hello,
If disk space and bandwidth are affordable (and from your setup it seems that they are affordable as you have everything locally) I would split the mail storage completely and use replication in between n-master servers (n=2 for your case).
The replication is not yet fully tested, but Timo is actively working on this feature.
The fear of lossing the imap session does not make sense (at least to me) as the client will reconnect automatically in the background.
Like this you have no SPOF and no split-brain and you get the flexibility (if needed) to geographically distribute your servers in the the future.
Keep each server with its own ip, connect to them via DNS (round robin etc etc).
We are currently experimenting with a setup similar to this one, but with geographically distributed servers (trans-continental) (bandwidth limited and high cost).
Best regards, Andrei
We once try to use similar solution as your first.
3 servers for LVS -HA
This master server redirect users for 2 or 3 dovecot backends..
The mail storage were maildir ontop of OCFS2
Our problem were that OCFS2 were too slow. We could not handle many users.
So we took an step back and now use only user one server.
But still thinking in go back to the first one. with LVS
When using LVS try to sticky user to the same backend, LVs can do ths by source ip.
Where i work we have problens on testign storage. If you have any advices for testing disk performance, i will be thankfull.
I wil be glad to answer anything else.
[]'sf.rique
On Tue, Jun 5, 2012 at 9:59 AM, ëÏÓÔÙÒÅ× áÌÅËÓÁÎÄÒ áÌÅËÓÅÅ×ÉÞ < a.kostyrev@serverc.ru> wrote:
I think LVS is just fine and it is not a SPOF 'cause it is actually 2 servers: active master --> and standby slave. LVS supports real time replication of connections from master to slave, so if master dies slave knows which IP was connected to which dovecot server.
I'm more worried about right design of mailstorage.. should I use some cluster fs with all mail of all users or should I split mailstorage across servers and somehow avoid long downtime if one of servers goes down.
-----Original Message----- From: dovecot-bounces@dovecot.org [mailto:dovecot-bounces@dovecot.org] On Behalf Of Matthias-Christian Ott Sent: Tuesday, June 05, 2012 11:28 PM To: dovecot@dovecot.org Subject: Re: [Dovecot] best practises for mail systems
On 2012-06-05 05:14, ëÏÓÔÙÒÅ× áÌÅËÓÁÎÄÒ áÌÅËÓÅÅ×ÉÞ wrote:
On each host system we created one VM and passed through 3x2TB disks into it.
In guests vms on top of this disks we made XFS and fired up glusterfs with distributed replicated volumes for our mailstorage.
so it looks like this:
vm1 replicate vm2
disk1 ------------> disk4
disk2 ------------> disk5
disk3 ------------> disk6
in each vm we mounted glusterfs and pointed dovecot to that dir for mail creation (as ltmp) and imap4 user access.
also we use exim as smtp.
So, with glusterfs as mailstorage we can go for LVS to load balancing for exim and dovecot.
so wherenever one of host systems (hence one of mail vms) goes down, users don't notice that
'cause LVS points them to working smtp and imap4 servers
and they get their mail 'cause of glusterfs. [...] Cons:
not quite sure if glusterfs is production ready solution 'cause I've experienced split-brains during setting it up
IO performance issue. Though we didn't yet run any io tests, but glusterfs uses fuse to mount on clients. And guys on #gluster told me writing to the glusterfs mount will not be strictly local io.
I'm not familiar with LVS, but from the project description it seems that you need a "front server" that does the load balancing, so you either have to run at least two of these servers in parallel or add to your cons that you introduced a single point of failure. But you mentioned that you only have two servers, so you really can do this.
I would rather ensure high availability by running the two servers as masters and using either IP address takeover or DNS failover (with dynamic DNS) and either use Dovecot's replication (I haven't tested it yet and I'm not sure what happens in case of IP address takeover) or a file system that can handle these kinds of errors (e.g. Coda). You could do load balancing via round-robin DNS. This only protects you against the failure of single machine and because IMAP sessions are not replicated between the servers, connections will get reset if one server fails, but it's cost-effective and uses software that already exists.
Regards, Matthias-Christian
!DSPAM:4fce037e104291424646138!
On 2012-06-05 17:33, Michescu Andrei wrote:
The fear of lossing the imap session does not make sense (at least to me) as the client will reconnect automatically in the background.
I agree, in practice this is not an issue compared to the unavailability of the service, but on longer IMAP sessions (e.g. transferring a big file) the connection loss is noticeable.
Like this you have no SPOF and no split-brain and you get the flexibility (if needed) to geographically distribute your servers in the the future.
Keep each server with its own ip, connect to them via DNS (round robin etc etc).
This depends on the resolver, operating systems and clients you want to support, because I read that not all networks generate proper ICMP/ICMPv6 Destination Unreachable messages and instead simple drop the packets, so that the clients first try to connect to the failed server until timeout and then connects to the second server. Since IMAP is a stateful protocol the latency of the initial connect to the failed server can be ignored, but if you want to eliminate this, you can use dynamic DNS to automatically remove the corresponding RRs (depending on your situation you need an external monitoring server for this to avoid problems in case of net splits).
We are currently experimenting with a setup similar to this one, but with geographically distributed servers (trans-continental) (bandwidth limited and high cost).
I also have some plans for a similar setup in the near future. Can you share your results on the mailing list? I'm especially interested if failover via DNS works in practice (I did some searches, but I'm not fully convinced of it, but it seems quite simple compared to other solutions).
Regards, Matthias-Christian
Hello,
I agree, in practice this is not an issue compared to the unavailability of the service, but on longer IMAP sessions (e.g. transferring a big file) the connection loss is noticeable.
It is noticeable for somebody that really waits for a large email. For the standard user there is nothing visible because the synchronization starts / fails and starts again...
In corporate environment the servers are "close" and the network is generally configured to have proper Destination Unreachable.
For road-warriors, the main concern is the uplink/downlink and generally not the couple of seconds lost due to time-out.
For the DNS... use "fast-flux"-like configuration and any proper resolver will behave correctly (at least in my experience).
For the road-warrior setup: DNS with geoip, and all locations with split-dns (internally HA setup with failover on external locations).
Unfortunately the classical HA setup (with heart-beat monitor, update DNS etc etc) it is not designed to be "internet-proof" (internet like in WAN). The initial design of the internet was to be able to operate even when significant segments are unavailable.
Picture the following scenario: master servers on each continent. Catastrophic failure of the trans-continental network => 5 big disconnected chunks of network fully functional. Any HA setup that I saw will fail miserably. The simplest design with fully replicated masters will continue to work.
Obviously planning for the scenario above is an overkill for most of the companies out there. Once you trow in the advantage of have the emails close to you anywhere where you go, then it starts making sense.
And you can top it up by segmenting you user base to replicate only the users that are on the go, or are important enough.
As for the current status of the ideal implementation: waiting for Timo to finalize the refactoring of dsync.
As a temporary solution: rsync replication with master-slave model (not master-master).
This design makes sense to us, but I'm sure that it is under-optimal for most other uses.
Andrei
Like this you have no SPOF and no split-brain and you get the flexibility (if needed) to geographically distribute your servers in the the future.
Keep each server with its own ip, connect to them via DNS (round robin etc etc).
This depends on the resolver, operating systems and clients you want to support, because I read that not all networks generate proper ICMP/ICMPv6 Destination Unreachable messages and instead simple drop the packets, so that the clients first try to connect to the failed server until timeout and then connects to the second server. Since IMAP is a stateful protocol the latency of the initial connect to the failed server can be ignored, but if you want to eliminate this, you can use dynamic DNS to automatically remove the corresponding RRs (depending on your situation you need an external monitoring server for this to avoid problems in case of net splits).
We are currently experimenting with a setup similar to this one, but with geographically distributed servers (trans-continental) (bandwidth limited and high cost).
I also have some plans for a similar setup in the near future. Can you share your results on the mailing list? I'm especially interested if failover via DNS works in practice (I did some searches, but I'm not fully convinced of it, but it seems quite simple compared to other solutions).
Regards, Matthias-Christian
!DSPAM:4fce5ae0149132093961185!
On 5.6.2012, at 23.33, Michescu Andrei wrote:
I agree, in practice this is not an issue compared to the unavailability of the service, but on longer IMAP sessions (e.g. transferring a big file) the connection loss is noticeable.
It is noticeable for somebody that really waits for a large email.
And there is actually some (any!) way this could be avoided?... One server dies, another continues sending the mail?
I have had some thoughts about transferring idling Dovecot connections between processes / servers so that clients wouldn't notice it, but I haven't even thought about moving active (long-running) connections.
Hello Timo,
And there is actually some (any!) way this could be avoided?... One server dies, another continues sending the mail?
I have had some thoughts about transferring idling Dovecot connections between processes / servers so that clients wouldn't notice it, but I haven't even thought about moving active (long-running) connections.
Here it is to be researched if this is specified in the IMAP standard (if there any RFC that mentions this?), or if we propose a new RFC with such an extension.
Until there is an RFC, even if you implement such a feature, there will be no clients out there that will support it.
A good start, if there is no RFC, is the http protocol, that has implemented the resume option. Like this you could even support parallel download from couple of imap servers that are synchronized, getting from each a small chunk (BitTorrent like with the seeds list being set to only the servers).
Best regards, Andrei
On 2012-06-05 23:43, Timo Sirainen wrote:
On 5.6.2012, at 23.33, Michescu Andrei wrote:
I agree, in practice this is not an issue compared to the unavailability of the service, but on longer IMAP sessions (e.g. transferring a big file) the connection loss is noticeable.
It is noticeable for somebody that really waits for a large email.
And there is actually some (any!) way this could be avoided?... One server dies, another continues sending the mail?
Yes, there is. You have to replicate the entire state of the IMAP session (protocol states, buffers, TLS state etc.) and the TCP state of the connection. The state of the IMAP session is (in theory) easily replicable (although you probably have to rely on internals of the TLS implementation; OpenSSL can serialise TLS sessions from/into ASN.1 via i2d_SSL_SESSION, though this is meant to resume session via TLS) and for TCP there is RTCP [1]. RTCP intercepts the TCP session is able to recover the TCP state. It works without any modification of the operating system (at the moment limited to Linux).
If this would be implemented in Dovecot it would really set it apart from other IMAP servers and software that I've seen so far. Being able to transparently handle failover of a TCP connection is unique.
I have had some thoughts about transferring idling Dovecot connections between processes / servers so that clients wouldn't notice it, but I haven't even thought about moving active (long-running) connections.
Load rebalancing would probably be another feature that separates Dovecot from other IMAP servers.
Regards, Matthias-Christian
On 9.6.2012, at 4.55, Matthias-Christian Ott wrote:
Yes, there is. You have to replicate the entire state of the IMAP session (protocol states, buffers, TLS state etc.) and the TCP state of the connection. The state of the IMAP session is (in theory) easily replicable (although you probably have to rely on internals of the TLS implementation; OpenSSL can serialise TLS sessions from/into ASN.1 via i2d_SSL_SESSION, though this is meant to resume session via TLS)
Interesting! I thought OpenSSL didn't have a way to [de]serialize the session state. The first time I wanted to do that was 13 years ago. I see there are some google hits for i2d_SSL_SESSION, but do you already know a good web page / example code I could look at?
and for TCP there is RTCP [1]. RTCP intercepts the TCP session is able to recover the TCP state. It works without any modification of the operating system (at the moment limited to Linux).
Thanks for this too.
If this would be implemented in Dovecot it would really set it apart from other IMAP servers and software that I've seen so far. Being able to transparently handle failover of a TCP connection is unique.
Yes.
On 2012-06-09 16:11, Timo Sirainen wrote:
On 9.6.2012, at 4.55, Matthias-Christian Ott wrote:
Yes, there is. You have to replicate the entire state of the IMAP session (protocol states, buffers, TLS state etc.) and the TCP state of the connection. The state of the IMAP session is (in theory) easily replicable (although you probably have to rely on internals of the TLS implementation; OpenSSL can serialise TLS sessions from/into ASN.1 via i2d_SSL_SESSION, though this is meant to resume session via TLS)
Interesting! I thought OpenSSL didn't have a way to [de]serialize the session state. The first time I wanted to do that was 13 years ago. I see there are some google hits for i2d_SSL_SESSION, but do you already know a good web page / example code I could look at?
The Apache httpd module mod_ssl uses it.
GnuTLS has similar functions with gnutls_db_*, although it's also only intended to be used to resume a session. Have look at the Apache httpd module mod_gnutls.
Regards, Matthias-Christian
05.06.2012 23:33, Michescu Andrei написал:
Picture the following scenario: master servers on each continent. Catastrophic failure of the trans-continental network => 5 big disconnected chunks of network fully functional. Any HA setup that I saw will fail miserably. The simplest design with fully replicated masters will continue to work.
Dispute the original topic, I'd say this looks like a good service idea, as many company may pay for such a service if it can be set up specifically for their needs (routing, logs, backups, redirections).
Gmail (and other big guys like them) won't be that fine-tunable (having point to service many customers with the same type of control), and companies sometime just won't deal with such a Big Brother to store their corporate mail due to internal regulations (read - 'corporate paranoia').
But the replication between "points of presence" (5 big datacenters, one per continent, won't be good topology) will be painful and we easily face split-brain situation, whichever replicaton scheme I can imagine.
Yours, Alexander
Hello Alexander,
But the replication between "points of presence" (5 big datacenters, one per continent, won't be good topology) will be painful and we easily face split-brain situation, whichever replication scheme I can imagine.
The split-brain is indeed the biggest problem of common replication schema. But IMAP was designed to work in disconnected mode most of the time and have only quick synchronizations.
So by design IMAP standard works in master-master models.
Getting back to the above picture (catastrophic failure of all the transcontinental links): one synchronizes his laptop in Europe (EU), crosses the ocean to North America (NA) and synchronizes again his laptop. In this moment all the changes on the EU hub up to the point of last synchronization are merged into the NA hub. This is the beauty of IMAP.
The biggest challenge on the the above scenario is the post-catastrophic synchronization which would move huge amounts of data across the links.
Best wishes, Andrei
Yours, Alexander
!DSPAM:4fceed61217344232183410!
participants (4)
-
Alexander Chekalin
-
Matthias-Christian Ott
-
Michescu Andrei
-
Timo Sirainen