Metric label values truncated when using OpenMetrics endpoint
Hi,
Recently we upgraded to Dovecot 2.3.11.3 and configured an example metric like this:
metric imap_command { event_name = imap_command_finished group_by = cmd_name tagged_reply_state user remote_ip }
And enabled the OpenMetrics listener like this:
service stats { inet_listener http { port = 5000 } }
While the result is great, I noticed that some metrics are being truncated (three dots at the end of the label value):
dovecot_imap_command_duration_usecs_sum{cmd_name="FETCH",tagged_reply_state="OK",user="xxxxxxxxxxx.inbox@xxxxxxxxxxxxx…",remote_ip="XXXX:XXXX:XXXX:XXXX:cc6b:cc7a:d…"} 2346 1598055662473
I couldn't find this information in the documentation so I'm wondering if this is a bug.
Thank you for your time.
Regards, Daan
On Sat, Aug 22, 2020 at 00:31:36 +0000, Daan van Gorkum wrote:
Hi,
Recently we upgraded to Dovecot 2.3.11.3 and configured an example metric like this:
metric imap_command { event_name = imap_command_finished group_by = cmd_name tagged_reply_state user remote_ip
Grouping by remote_ip seems a bit dangerous unless the ips are somehow limited. Each unique value will result in (permanent) memory allocation, so this has the potential to slowly grow the stats process large enough that the vsize limit kicks in and the stats process gets killed and respawned.
} ... While the result is great, I noticed that some metrics are being truncated (three dots at the end of the label value):
dovecot_imap_command_duration_usecs_sum{cmd_name="FETCH",tagged_reply_state="OK",user="xxxxxxxxxxx.inbox@xxxxxxxxxxxxx…",remote_ip="XXXX:XXXX:XXXX:XXXX:cc6b:cc7a:d…"} 2346 1598055662473
I couldn't find this information in the documentation so I'm wondering if this is a bug.
Hm. The submetric names are truncated to 32 bytes because nobody will ever want metrics bigger than that. ;) Obviously, we need to revisit that decision. There is no workaround that I can think of.
Jeff.
-- I'm somewhere between geek and normal. - Linus Torvalds
Hi Jeff,
Thanks for your reply!
Regarding grouping by remote address, I understand and for now I'll keep a close eye. Maybe it's an option to group by /24 for ipv4 and /64 for IPv6? We currently do that based on the logs but the OpenMetrics endpoint seem a lot easier.
A slight hijack of the original question: but I tried to log only IP addresses (+ result) of failed login attempts but it seems I cannot find a metric that would contain that information. Am I looking in the wrong direction? The auth_userdb_request_finished and auth_passdb_request_finished stats work as intended but they do not contain any information about the connecting client.
Thanks again!
Daan
From: Josef 'Jeff' Sipek <jeff.sipek@open-xchange.com> Sent: 24 August 2020 20:47 To: Daan van Gorkum <daan.vangorkum@vusam.com> Cc: Dovecot Mailing List <dovecot@dovecot.org> Subject: Re: Metric label values truncated when using OpenMetrics endpoint
On Sat, Aug 22, 2020 at 00:31:36 +0000, Daan van Gorkum wrote:
Hi,
Recently we upgraded to Dovecot 2.3.11.3 and configured an example metric like this:
metric imap_command { event_name = imap_command_finished group_by = cmd_name tagged_reply_state user remote_ip
Grouping by remote_ip seems a bit dangerous unless the ips are somehow limited. Each unique value will result in (permanent) memory allocation, so this has the potential to slowly grow the stats process large enough that the vsize limit kicks in and the stats process gets killed and respawned.
} ... While the result is great, I noticed that some metrics are being truncated (three dots at the end of the label value):
dovecot_imap_command_duration_usecs_sum{cmd_name="FETCH",tagged_reply_state="OK",user="xxxxxxxxxxx.inbox@xxxxxxxxxxxxx…",remote_ip="XXXX:XXXX:XXXX:XXXX:cc6b:cc7a:d…"} 2346 1598055662473
I couldn't find this information in the documentation so I'm wondering if this is a bug.
Hm. The submetric names are truncated to 32 bytes because nobody will ever want metrics bigger than that. ;) Obviously, we need to revisit that decision. There is no workaround that I can think of.
Jeff.
-- I'm somewhere between geek and normal. - Linus Torvalds
On Tue, Aug 25, 2020 at 01:08:06 +0000, Daan van Gorkum wrote:
Hi Jeff,
Thanks for your reply!
Regarding grouping by remote address, I understand and for now I'll keep a close eye. Maybe it's an option to group by /24 for ipv4 and /64 for IPv6?
Hrm, interesting idea. But the answer is: no, there isn't a way. The simplest way to implement something like this would be to add a new aggregating function. So one could do something like:
remote_ip:netmask4:24
remote_ip:netmask6:64
To get /24 and /64, respectively.
I'll throw this idea on the ever growing pile of things that can be worked on :) Obviously, I can't make any promisses about this ever getting done.
We currently do that based on the logs but the OpenMetrics endpoint seem a lot easier.
Aggregating based on a subnet definitely makes sense.
A slight hijack of the original question: but I tried to log only IP addresses (+ result) of failed login attempts but it seems I cannot find a metric that would contain that information. Am I looking in the wrong direction? The auth_userdb_request_finished and auth_passdb_request_finished stats work as intended but they do not contain any information about the connecting client.
I haven't played with these events, but at least based on the docs [1], auth_client_userdb_lookup_finished and auth_client_passdb_lookup_finished events seem to have the remote_ip field as well as an error string on failure. Does that give you the info you need?
Jeff.
Hi Jeff,
Thanks again for your insights. I understand that a lot of features are pending and it's totally fine, we're just very eager to use all these features.
Reagarding the logging of failed attempts I did try to configure this in these two metrics:
metric auth_client_userdb_lookup_finished { event_name = auth_client_userdb_lookup_finished group_by = service local_ip remote_ip user }
metric auth_client_passdb_lookup_finished { event_name = auth_client_passdb_lookup_finished group_by = service local_ip remote_ip user }
In both our testing and production environment these are not increasing and stay at 0:
$ sudo curl 0:9166/metrics -s | grep -E 'dovecot_auth_client_(user|pass)db_lookup_finished' # HELP dovecot_auth_client_userdb_lookup_finished_count Total number # TYPE dovecot_auth_client_userdb_lookup_finished_count counter dovecot_auth_client_userdb_lookup_finished_count 0 1598493762018 # HELP dovecot_auth_client_userdb_lookup_finished_duration_usecs_sum Duration # TYPE dovecot_auth_client_userdb_lookup_finished_duration_usecs_sum counter dovecot_auth_client_userdb_lookup_finished_duration_usecs_sum 0 1598493762018 # HELP dovecot_auth_client_passdb_lookup_finished_count Total number # TYPE dovecot_auth_client_passdb_lookup_finished_count counter dovecot_auth_client_passdb_lookup_finished_count 0 1598493762018 # HELP dovecot_auth_client_passdb_lookup_finished_duration_usecs_sum Duration # TYPE dovecot_auth_client_passdb_lookup_finished_duration_usecs_sum counter dovecot_auth_client_passdb_lookup_finished_duration_usecs_sum 0 1598493762018
I will try to gather some more information and post it later. Let me know if you think there might be something up with the way we configure them.
Daan
From: Josef 'Jeff' Sipek <jeff.sipek@open-xchange.com> Sent: 26 August 2020 23:02 To: Daan van Gorkum <daan.vangorkum@vusam.com> Cc: Dovecot Mailing List <dovecot@dovecot.org> Subject: Re: Metric label values truncated when using OpenMetrics endpoint
On Tue, Aug 25, 2020 at 01:08:06 +0000, Daan van Gorkum wrote:
Hi Jeff,
Thanks for your reply!
Regarding grouping by remote address, I understand and for now I'll keep a close eye. Maybe it's an option to group by /24 for ipv4 and /64 for IPv6?
Hrm, interesting idea. But the answer is: no, there isn't a way. The simplest way to implement something like this would be to add a new aggregating function. So one could do something like:
remote_ip:netmask4:24
remote_ip:netmask6:64
To get /24 and /64, respectively.
I'll throw this idea on the ever growing pile of things that can be worked on :) Obviously, I can't make any promisses about this ever getting done.
We currently do that based on the logs but the OpenMetrics endpoint seem a lot easier.
Aggregating based on a subnet definitely makes sense.
A slight hijack of the original question: but I tried to log only IP addresses (+ result) of failed login attempts but it seems I cannot find a metric that would contain that information. Am I looking in the wrong direction? The auth_userdb_request_finished and auth_passdb_request_finished stats work as intended but they do not contain any information about the connecting client.
I haven't played with these events, but at least based on the docs [1], auth_client_userdb_lookup_finished and auth_client_passdb_lookup_finished events seem to have the remote_ip field as well as an error string on failure. Does that give you the info you need?
Jeff.
On Thu, Aug 27, 2020 at 02:04:20 +0000, Daan van Gorkum wrote:
Hi Jeff,
Thanks again for your insights. I understand that a lot of features are pending and it's totally fine, we're just very eager to use all these features.
I completely forgot that the work to add those fields was done and was pending review and merging. It all got merged to master 6 days ago. It looks like it'll be part of 2.3.12, which AFAIK should come out Soon(tm). (Obviously, no promisses about when exactly it'll get released.)
The documentation already contains the field names, so if you want to experiment, give the master branch a try to see if it does what you need.
https://doc.dovecot.org/admin_manual/list_of_events/#common-fields
Jeff.
participants (2)
-
Daan van Gorkum
-
Josef 'Jeff' Sipek