think about this, you have connect= host=host1 host=host2 ...
when host1 fails, you must kick that host while it is down, each and every single time you want a new connection to the database. there is no record saying that box is down, no logic in the app saying i've already tried that box, so avoid it for "some period of time". you have to attempt a connection to it, since it is listed first in your connection parameters. you have to get an error back or wait for a timeout period to occur, and then move onto the next box.
have you ever faced a DNS resolution issue, where DNS was not working and everything slowed to a crawl because of it? similar kind of delays in processing here. you have to have an error or timeout occur before you move onto the next configured box. when you are dealing with high volume production environments, this is not a scaled solution.
with load balancing, you have an active connection from your app to the database VIP. then there is a secondary connection from the load balancer to the database host in the pool. when that connection dies and goes away, the app retries the connection to VIP and is automatically assigned to a different, alive and available host in the pool. no waiting for an error or timeout to occur, and then trying the next host in the config.
when you put the infrastructure together properly, no human intervention is required to mark a box down and not send traffic to it. as i said:
it requires a bit of supporting infrastructure to get it all working, but you wind up with a */highly-available, fault tolerant PostgreSQL footprint with automatic failover./*
by having the load balancer keep track of which hosts are alive and available, you dont have any guess work as to which host to connect to. the app just has to retry the connection. once a host is marked down by the load balancer, no traffic will be sent to it until it passes a health check. the health checking with PostgreSQL is the ETCd and Patroni pieces. those two processes are critical in the chain. with the health of the boxes and processes handled, knowing which PG host is the active R/W one is simple. when failure occurs, there is an election process to promote one of the standby hosts to the active R/W node, and that is reported up to the load balancer through the Patroni -> ETCd chain. all new connections wind up going to the newly elected active R/W node in the pool. *//*
On 2/22/23 5:29 AM, David White wrote:
If I understand open source Postgres correctly, though, this setup basically requires that the application be read-only, or at least be intelligent enough not to attempt to write to a host if it has failed over to it, right? Don't you have to have human intervention to actually fail the master / primary over for write purposes?
Sent with Proton Mail secure email.
------- Original Message ------- On Tuesday, February 21st, 2023 at 10:28 PM, Marcus Jodorf<bofh@killfile.de> wrote:
On 2/21/23 18:04, Lars Schotte wrote:
Yes, that looks nice, I am going to try that too.
Because every other option is based on some other software, like relayd or nginx, it is all possible, but adds complexity.
The best would be to have it inside connection string. /etc/dovecot/dovecot-sql.conf.ext:
... connect= host=host1 host=host2 dbname=<db> user=<user> password=<pwd>
I'm using that since pretty much 2006 - if I'm not mistaken - on my little servers. Simply works.
If one server is not reachable you just get an error log entry in mail.err that connect failed to the database that is down. But that is all - dovecot keeps working as normal.
BTW: Same with postfix. Simply list an additional fallback and it just works.
Best,
Marcus