Hi Timo,
thanks for your quick answer.
On 23.01.2024 00:18, Timo Sirainen wrote:
When doveadm server receives at lot of connects on port 50000 the http service on port 50001 is not responding until load on port 50000 drops to zero. It seems like doveadm-server is prefering port 50000 over 50001.
Looking into Recv-Q when http hangs
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp LISTEN 129 128 0.0.0.0:50000 0.0.0.0:* tcp LISTEN 1 128 0.0.0.0:50001 0.0.0.0:*
By "drops to zero" do you mean the connection queue has to drain until Recv-Q is below 129? Or that even then it needs to go down further?
The Recv-Q for port 50000 needs to be empty to have port 50001 served.
Also, doveadm process isn't supposed to be handling more than one client at a time. So if it has a lot of pressure, why does it matter which port it's answering to since it can't handle all connections anyway?
For load peaks doveadm is only serving port 50000 until all requests on port 50000 are served. At this time no requests on port 50001 are handled. Our customer observes the situation that 40 client processes are waiting for port 50001 longer than 40 minutes because of heavy load on port 50000.
With the fix wouldn't it still take several seconds to connect to either 50000 or 50001, since now both the queues are full? Or why is it different?
In the customers situation it would make the difference of seconds against 40 minutes. Our customer has the uneven situation where a big load on port 50000 and small load on 50001 is produced. Which led to port 50001 not served anymore. In this situation it would be better to distribute limited rsources evenly over both ports.
Although it's using quite a lot of randomness (= /dev/urandom reads), which isn't so good. I think it would be just as good to do round-robin:
static int i_start = 0;
I also thought of round robin. I think there is the problem that the newly forked doveadm process is always initializing i_start with 0. And then the first thing it does is to accept connection on port 50000. With service_count = 1 I think this would not solve the problem.
But even so, I'd like to understand better what exactly this is helping with before merging. Looking at libevent and nginx, I don't see them doing anything like this either.
In this situation the epoll_wait() in doveadm process returns two ready sockets (port 50000 and 50001). The loop "for (i = 0; i < ret; i++) {" within io_loop_handler_run_internal() also trys to handle both events. But handling the first event (port 50000) disables the handler for port 50001 because doveadm will only handle one request at a time. Thus a newly forked doveadm always chooses port 50000 if a request for this port is waiting. Thus as long there is waiting requests for port 50000 requests for port 50001 are completely ignored.
Maybe it is a good idea to use round robin if service_count is > 1 and random if service_count == 1.
The problem might be related to the customers config service_count = 1 for doveadm. Setting service_count = 100 also resolved the problem. But I think because of the performance boost of about factor 10.
Making the two ports independent services also resolves the situation:
service doveadm { inet_listener { port = 50000 } }
service doveadm_http { executable = doveadm-server inet_listener http { port = 50001 ssl = no } }
Kind regards Achim https://www.tallence.com Folgt uns auf LinkedIn https://www.linkedin.com/company/tallence-ag // XING https://www.xing.com/pages/tallenceag // Kununu - Top Company 2024 https://www.kununu.com/de/tallence2
Tallence AG // Sitz der Gesellschaft: Hamburg // Neue Gröningerstraße 13, 20457 Hamburg // Amtsgericht Hamburg HRB 142900 Vorstand: Frank Moll (Vorsitzender), Bernd Scherf // Aufsichtsrat: Wolf Ingomar Faecks (Vorsitzender)
Die Speicherung und Verarbeitung personenbezogener Daten erfolgt entsprechend unserer Datenschutzhinweise. https://www.tallence.com/datenschutzhinweise-gp