Hello list,
I built an email system using a proxy / director pair (IMAP, POP3, LMTP) and a backend pair.
To have an HA system, I would like to understand if it is better to use an NFS export or replication to save emails and index files
NFS is provided by a NAS (in HA), while for replication I would use the local backend disks
Which of the two systems is more reliable? Are there any drawbacks for one or the other?
Thanks, Andrea
--
Buy a Pentium 586/200 so you can reboot faster.
TIM San Marino S.p.A. Andrea Gabellini Engineering R&D TIM San Marino S.p.A. - https://www.telecomitalia.sm Via Ventotto Luglio, 212 - Piano -2 47893 - Borgo Maggiore - Republic of San Marino Tel: (+378) 0549 886237 Fax: (+378) 0549 886188
-- Informativa Privacy
Questa email ha per destinatari dei contatti presenti negli archivi di TIM San Marino S.p.A.. Tutte le informazioni vengono trattate e tutelate nel rispetto della normativa vigente sulla protezione dei dati personali (Reg. EU 2016/679). Per richiedere informazioni e/o variazioni e/o la cancellazione dei vostri dati presenti nei nostri archivi potete inviare una email a privacy@telecomitalia.sm.
Avviso di Riservatezza
Il contenuto di questa e-mail e degli eventuali allegati e' strettamente confidenziale e destinato alla/e persona/e a cui e' indirizzato. Se avete ricevuto per errore questa e-mail, vi preghiamo di segnalarcelo immediatamente e di cancellarla dal vostro computer. E' fatto divieto di copiare e divulgare il contenuto di questa e-mail. Ogni utilizzo abusivo delle informazioni qui contenute da parte di persone terze o comunque non indicate nella presente e-mail potra' essere perseguito ai sensi di legge.
On Jul 15, 2020, at 12:33 PM, Andrea Gabellini <andrea.gabellini@telecomitalia.sm> wrote:
Hello list,
I built an email system using a proxy / director pair (IMAP, POP3, LMTP) and a backend pair.
To have an HA system, I would like to understand if it is better to use an NFS export or replication to save emails and index files
NFS is provided by a NAS (in HA), while for replication I would use the local backend disks
Which of the two systems is more reliable? Are there any drawbacks for one or the other?
The biggest problem with using NFS is that you’re using NFS and bringing along all the baggage that comes with it. Writes over the network are inherently slower than writes to local storage, plus locking gets interesting, to say the least.
I posted a while back about using something similar to Joyent's manatee to bootstrap replication. (If IMAP replication works anything like databases, a system could join the cluster, get a base state by streaming a ZFS snapshot of an existing peer to the new peer, and from there it catches up via the normal replication mechanisms.) I don’t know if that would be feasible, but it’s certainly something I might try to make work. I also don’t know whether it gets more dicey in a multiple-primary situation.)
But long and short of it. Avoid NFS if you can. The last time I used NFS for mail was last century, and even with everybody using native *nix MUAs like pine and elm, we could run into fun locking issues.
-- Coy Hile coy.hile@coyhile.com
On 2020-07-15 17:33, Andrea Gabellini wrote:
Hello list,
I built an email system using a proxy / director pair (IMAP, POP3, LMTP) and a backend pair.
To have an HA system, I would like to understand if it is better to use an NFS export or replication to save emails and index files
NFS is provided by a NAS (in HA), while for replication I would use the local backend disks
Which of the two systems is more reliable? Are there any drawbacks for one or the other?
Another option to consider is DRBD replication of the disks at the block level.
Despite what you might expect, performance and latency is quite good. A number of years ago I ran such a setup hosting a high traffic MySQL database, and it worked well. The disks where the limiting factor, not the network. In my case the two servers where directly connected by one cable without using a switch or suchlike.
One thing to be aware of with DRBD is that the slave disk is not accessible at all until you trigger a fail-over, so you can't use it for read traffic.
-- David Pottage
I built an email system using a proxy / director pair (IMAP, POP3, LMTP) and a backend pair.
To have an HA system, I would like to understand if it is better to use an NFS export or replication to save emails and index files
NFS is provided by a NAS (in HA), while for replication I would use the local backend disks
Which of the two systems is more reliable? Are there any drawbacks for one or the other?
This decision is more about how many users you have in total and how you can partition them.
A) 200 domains with 10 IMAP accounts each
For high availability two dovecot servers with replication are sufficient, no director/nfs needed. Return both server ips via dns for imap.domain.com and you get active/active load balancing for free.
There is no shared storage which means no locking problems. Dovecot can use optimizations like mmap which is not possible with nfs.
B) 200000 IMAP accounts, all within the same domain
You cannot partition by domain and a single server cannot handle the load.
Here imap.domain.com could return e.g. 5 ips via DNS that point to your directors. The director's job is to send all connections of one particular user to the same backend, i.e. Outlook at work, Thunderbird at home and K9 Mail on a mobile phone could be active at the same time, but all are directed to the same backend server. This way locking issues with nfs are avoided because only one server is accessing the mailbox at a time.
IIRC you need to monitor your backend servers and add/remove them on failure.
If the nfs mount is not available on the backend, dovecot may create a new (empty) mailbox, which could break things. You need to set permissions in a way that cannot happen.
C) like B) but with a static proxy mapping where users are assigned to a certain backend server by configuration, that could be replicated like A) without nfs.
While A) in principle has a higher performance due to local disks and optimizations B) can have a higher overall performance as dedicated storage appliances usually have a lot more disks (ssd caching, ...) and 10G+ networking.
C) avoids nfs but may introduce more complexity when software like pacemaker is used to provide failover.
See https://wiki2.dovecot.org/Director and https://wiki2.dovecot.org/NFS
Best regards Gerald
Thank you all for replies!!!
Some missing infos...
- As load balancer I'm using a pair of keepalived with simple setup and not the DNS
- Load balancer algorithm is "Weighted Least-Connection"
- About 20 domains and 3000 email
- I'm monitoring my backend servers with poolmon
- The backend servers are virtual machine (vmware) with datastore on "all flash" storage
based on yours notes, I think the better choice is Replication. Correct?
Thanks, Andrea
Il 16/07/20 01:43, Gerald Galster ha scritto:
I built an email system using a proxy / director pair (IMAP, POP3, LMTP) and a backend pair.
To have an HA system, I would like to understand if it is better to use an NFS export or replication to save emails and index files
NFS is provided by a NAS (in HA), while for replication I would use the local backend disks
Which of the two systems is more reliable? Are there any drawbacks for one or the other? This decision is more about how many users you have in total and how you can partition them.
A) 200 domains with 10 IMAP accounts each
For high availability two dovecot servers with replication are sufficient, no director/nfs needed. Return both server ips via dns for imap.domain.com and you get active/active load balancing for free.
There is no shared storage which means no locking problems. Dovecot can use optimizations like mmap which is not possible with nfs.
B) 200000 IMAP accounts, all within the same domain
You cannot partition by domain and a single server cannot handle the load.
Here imap.domain.com could return e.g. 5 ips via DNS that point to your directors. The director's job is to send all connections of one particular user to the same backend, i.e. Outlook at work, Thunderbird at home and K9 Mail on a mobile phone could be active at the same time, but all are directed to the same backend server. This way locking issues with nfs are avoided because only one server is accessing the mailbox at a time.
IIRC you need to monitor your backend servers and add/remove them on failure.
If the nfs mount is not available on the backend, dovecot may create a new (empty) mailbox, which could break things. You need to set permissions in a way that cannot happen.
C) like B) but with a static proxy mapping where users are assigned to a certain backend server by configuration, that could be replicated like A) without nfs.
While A) in principle has a higher performance due to local disks and optimizations B) can have a higher overall performance as dedicated storage appliances usually have a lot more disks (ssd caching, ...) and 10G+ networking.
C) avoids nfs but may introduce more complexity when software like pacemaker is used to provide failover.
See https://wiki2.dovecot.org/Director and https://wiki2.dovecot.org/NFS
Best regards Gerald
--
One person's error is another person's data.
TIM San Marino S.p.A. Andrea Gabellini Engineering R&D TIM San Marino S.p.A. - https://www.telecomitalia.sm Via Ventotto Luglio, 212 - Piano -2 47893 - Borgo Maggiore - Republic of San Marino Tel: (+378) 0549 886237 Fax: (+378) 0549 886188
-- Informativa Privacy
Questa email ha per destinatari dei contatti presenti negli archivi di TIM San Marino S.p.A.. Tutte le informazioni vengono trattate e tutelate nel rispetto della normativa vigente sulla protezione dei dati personali (Reg. EU 2016/679). Per richiedere informazioni e/o variazioni e/o la cancellazione dei vostri dati presenti nei nostri archivi potete inviare una email a privacy@telecomitalia.sm.
Avviso di Riservatezza
Il contenuto di questa e-mail e degli eventuali allegati e' strettamente confidenziale e destinato alla/e persona/e a cui e' indirizzato. Se avete ricevuto per errore questa e-mail, vi preghiamo di segnalarcelo immediatamente e di cancellarla dal vostro computer. E' fatto divieto di copiare e divulgare il contenuto di questa e-mail. Ogni utilizzo abusivo delle informazioni qui contenute da parte di persone terze o comunque non indicate nella presente e-mail potra' essere perseguito ai sensi di legge.
Some missing infos...
- As load balancer I'm using a pair of keepalived with simple setup and not the DNS
- Load balancer algorithm is "Weighted Least-Connection"
- About 20 domains and 3000 email
- I'm monitoring my backend servers with poolmon
- The backend servers are virtual machine (vmware) with datastore on "all flash" storage
based on yours notes, I think the better choice is Replication. Correct?
In my experience it's best to keep complexity low because the fewer components you have, the fewer can fail. With replication you basically have two independent servers that asynchronously sync emails.
While it would work with loadbalancers/keepalived/director they are not necessary. If this is the way you want to go you should configure the loadbalancer to always send the same source-ip to the same backend (ip stickyness). Mailclients do open several connections in parallel and they should see the same data.
With DNS this happens automatically because ips are rotated by resolvers and the mailclient gets the same ip for all its connections. Failover is builtin as mailclients just connect to the second ip when the first is not reachable.
Replication works reliable with mdbox/sdbox but you should avoid maildir.
Best regards Gerald
Il 16/07/20 12:40, Gerald Galster ha scritto:
Some missing infos...
- As load balancer I'm using a pair of keepalived with simple setup and not the DNS
- Load balancer algorithm is "Weighted Least-Connection"
- About 20 domains and 3000 email
- I'm monitoring my backend servers with poolmon
- The backend servers are virtual machine (vmware) with datastore on "all flash" storage
based on yours notes, I think the better choice is Replication. Correct? In my experience it's best to keep complexity low because the fewer components you have, the fewer can fail. With replication you basically have two independent servers that asynchronously sync emails.
I completely agree!!!
While it would work with loadbalancers/keepalived/director they are not necessary. If this is the way you want to go you should configure the loadbalancer to always send the same source-ip to the same backend (ip stickyness). Mailclients do open several connections in parallel and they should see the same data.
In my setup the load balancers do exactly this, and the director map the same username/email (not the same source IP) to the same backend server. Director setup is not so complex and I trust it
With DNS this happens automatically because ips are rotated by resolvers and the mailclient gets the same ip for all its connections. Failover is builtin as mailclients just connect to the second ip when the first is not reachable.
I don't trust DNS load balancing. I saw too many times a client stuck with the wrong (down) IP... This is my experience ;-)
Replication works reliable with mdbox/sdbox but you should avoid maildir.
I'm using and I like Maildir. There are some documentation about to don't use it with replication? Which are the drawbacks?
Thanks, Andrea
Best regards Gerald
--
A picture tells a thousand words. To make a picture costs more than a thousand words. A picture is slower than a thousand words.
TIM San Marino S.p.A. Andrea Gabellini Engineering R&D TIM San Marino S.p.A. - https://www.telecomitalia.sm Via Ventotto Luglio, 212 - Piano -2 47893 - Borgo Maggiore - Republic of San Marino Tel: (+378) 0549 886237 Fax: (+378) 0549 886188
-- Informativa Privacy
Questa email ha per destinatari dei contatti presenti negli archivi di TIM San Marino S.p.A.. Tutte le informazioni vengono trattate e tutelate nel rispetto della normativa vigente sulla protezione dei dati personali (Reg. EU 2016/679). Per richiedere informazioni e/o variazioni e/o la cancellazione dei vostri dati presenti nei nostri archivi potete inviare una email a privacy@telecomitalia.sm.
Avviso di Riservatezza
Il contenuto di questa e-mail e degli eventuali allegati e' strettamente confidenziale e destinato alla/e persona/e a cui e' indirizzato. Se avete ricevuto per errore questa e-mail, vi preghiamo di segnalarcelo immediatamente e di cancellarla dal vostro computer. E' fatto divieto di copiare e divulgare il contenuto di questa e-mail. Ogni utilizzo abusivo delle informazioni qui contenute da parte di persone terze o comunque non indicate nella presente e-mail potra' essere perseguito ai sensi di legge.
With DNS this happens automatically because ips are rotated by resolvers and the mailclient gets the same ip for all its connections. Failover is builtin as mailclients just connect to the second ip when the first is not reachable.
I don't trust DNS load balancing. I saw too many times a client stuck with the wrong (down) IP... This is my experience ;-)
Interesting, I have deployed that dns-based approach where two dovecot servers are replicating between two distant datacenters. A few years ago one datacenter had a major outage and new connections quickly failed over to the remaining server. Maybe this is client specific and/or has improved over time.
If the loadbalancer/director approach works for you, that's ok.
Replication works reliable with mdbox/sdbox but you should avoid maildir.
I'm using and I like Maildir. There are some documentation about to don't use it with replication? Which are the drawbacks?
Maildir is probably the most robust mail storage format, but it is very demanding on your disks because flags like "Seen" are encoded in the filename. Every flag change needs IO as well as copying/moving/deleting mails, quota, ... A maildir with 100k+ mails can impact the servers overall performance, but as you use all flash storage that may not be a problem.
I remembered something about replication and maildir, took me some time to find it:
https://dovecot.org/pipermail/dovecot/2017-February/107125.html
Timo said (Mon Feb 20 10:09:48 UTC 2017):
"There seems to be something weird with using Maildir and replication. Haven't had time to debug it and it's likely not an easy bug to fix, so for now the solution would be to use only sdbox/mdbox with replication."
I don't know if that is still the case, I can just tell mdbox works for me.
Best regards, Gerald
Hi All,
I am also in a similar environment. I also stuck here.
I have 2 test servers with the below configuration.
========================== Linux OS - Red Hat Enterprise Linux Server release 7.7 (Maipo) Dovecot version - 2.2.36 (1f10bfa63) Postfix version - 2.10.1
Trying to create High Availability.
I have added both of the above servers behind a F5 load balancer. I have got a Load Balancer FQDN "intl-dev-imaptest.testorg.com". I have enabled/opened the ports (25/110/143/993/995) on the above " intl-dev-imaptest.testorg.com".
When I send 10 emails to "intl-dev-imaptest.testorg.com", then those 10 emails are getting distributed between the above 2 backend servers (5 emails to each server). I see those 5 emails each in both the servers.
From Outlook I have configured the email address using "POP and IMAP", when I gave the IMAP server as "intl-dev-imaptest.testorg.com" ,then it shows only 5 emails from server1 in outlook and after a few seconds/minutes, automatically it shows/refreshes the other 5 emails from server2. But I am not seeing all the 10 emails at the same time. why?
So I tried the sync command. When I execute sync command like below from server1, it reflects the same emails in other server2 also. Then I see the same number of emails in both the servers. Is it not possible to access the both servers emails at one time with the "sync" command? Do we need to run this on all the email boxes on both servers? don't we miss/lose any emails during this sync process multiple times?
"doveadm sync -f -u kishore@test.testorg.com remote:vmail@bal3200dev002.testorg.com"
Is "replication" and "sync" are same?
Why are we not able to see all the emails at one time without the "sync" command?
What is the best and easiest way to create High Availability with just 2 servers, like emails should travel to both servers equally and if one server goes down also, another server should take care of the emails/functionality. This is my requirement.
My current real time environment: I have around 10 email domains and each domain is having 10 imap emails. In total around 100 email boxes/addresses. We receive around 50K emails in a day to those email addresses. We are using the "Maildir" format in our environment. Want to move to the High Availability option with 2 servers.
Please help me to fix the issue.
Thanks & Regards, Kishore Potnuru
On Thu, Jul 16, 2020 at 2:33 PM Gerald Galster <list+dovecot@gcore.biz> wrote:
With DNS this happens automatically because ips are rotated by resolvers and the mailclient gets the same ip for all its connections. Failover is builtin as mailclients just connect to the second ip when the first is not reachable.
I don't trust DNS load balancing. I saw too many times a client stuck with the wrong (down) IP... This is my experience ;-)
Interesting, I have deployed that dns-based approach where two dovecot servers are replicating between two distant datacenters. A few years ago one datacenter had a major outage and new connections quickly failed over to the remaining server. Maybe this is client specific and/or has improved over time.
If the loadbalancer/director approach works for you, that's ok.
Replication works reliable with mdbox/sdbox but you should avoid maildir.
I'm using and I like Maildir. There are some documentation about to don't use it with replication? Which are the drawbacks?
Maildir is probably the most robust mail storage format, but it is very demanding on your disks because flags like "Seen" are encoded in the filename. Every flag change needs IO as well as copying/moving/deleting mails, quota, ... A maildir with 100k+ mails can impact the servers overall performance, but as you use all flash storage that may not be a problem.
I remembered something about replication and maildir, took me some time to find it:
https://dovecot.org/pipermail/dovecot/2017-February/107125.html
Timo said (Mon Feb 20 10:09:48 UTC 2017):
"There seems to be something weird with using Maildir and replication. Haven't had time to debug it and it's likely not an easy bug to fix, so for now the solution would be to use only sdbox/mdbox with replication."
I don't know if that is still the case, I can just tell mdbox works for me.
Best regards, Gerald
participants (5)
-
Andrea Gabellini
-
Coy Hile
-
David Pottage
-
Gerald Galster
-
Kishore Potnuru