[Dovecot] dovecot stats: useful data to gather
Timo,
following our discussion on dovecot stats at the LinuxTag 2012 my team and I sat down and put together a list of stat items we think to be useful in daily dovecot usage.
Besides pulling together all the data we also think it would be useful to have an SNMP interface to access the stats. Our offer to create and contribute a standalone web interface for dovecot stats stands.
Here are the stats we believe to be useful:
Login/Logout
- total number login success/time
- total number login failure/time
- total number per authentication mechanism
- total number plain sessions
- total number STARTTLS sessions
- total number of currently connected users (pop3/pop3s/imap/imaps/managesieve)
- login names of connected users (not really stats, but great for actions regarding those uses e.g. force logout)
- total number logout commands/time
- total number BYE responses (autologout)
Mailbox state
- Inflow rate (number incoming messages/time)
- Deleted rate (number \Deleted flagged messages/time)
- Expunge rate (number Expunge operations/time)
- total number current messages mailboxes normal storage
- total number current messages mailboxes alt storage
- total number read messages mailboxes normal storage
- total number read messages mailboxes alt storage
- per user number current messages mailboxes normal storage
- per user number current messages mailboxes alt storage
- per user number read messages mailboxes normal storage
- per user number read messages mailboxes alt storage
Mailbox Quota
- total number persons under soft-quota per quota
- total number persons above or equal soft-quota per quota
- total number persons above or equal hard-quota per quota
Performance
- minimum time to write a message
- maximum time to write a message
- average time to write a message
- minimum time to modify a message
- maximum time to modify a message
- average time to modify a message
- minimum time to delete a message
- maximum time to delete a message
- average time to delete a message
- minimum time search operations
- maximum time search operations
- average time search operations
Regards,
p@rick
-- state of mind ()
Franziskanerstraße 15 Telefon +49 89 3090 4664 81669 München Telefax +49 89 3090 4666
Amtsgericht München Partnerschaftsregister PR 563
On 1.6.2012, at 23.58, Patrick Ben Koetter wrote:
Besides pulling together all the data we also think it would be useful to have an SNMP interface to access the stats.
I had thought about SNMP before also, but for the current kind of stats that are exported I couldn't think of any reasonable way to export them.
Here are the stats we believe to be useful:
Login/Logout
- total number login success/time
- total number login failure/time ..
I'll look at these later in more detail, but some important questions / design decisions:
Currently stats process only remembers things after Dovecot was started. I don't think getting these kind of numbers would really work like that. Perhaps all of the statistics should be permanently dumped to disk every ~minute or so + at shutdown and loaded at startup, so the numbers would at least normally always just increase since the first time Dovecot was started?
Mailbox state
- Inflow rate (number incoming messages/time)
- Deleted rate (number \Deleted flagged messages/time)
These operations/time type of things I had hoped to be able to externalize :) If stats process simply gives the raw stats, the reader could do this kind of summing up. Otherwise .. well, I guess it could maybe keep track of the current ops/
Performance
- minimum time to write a message
- maximum time to write a message
- average time to write a message
Within last .. day? hour? minute? ..
- Timo Sirainen dovecot@dovecot.org:
On 1.6.2012, at 23.58, Patrick Ben Koetter wrote:
Besides pulling together all the data we also think it would be useful to have an SNMP interface to access the stats.
I had thought about SNMP before also, but for the current kind of stats that are exported I couldn't think of any reasonable way to export them.
I am not an expert on SNMP, others in my office are, but as I understand it there's no need for Dovecot to export the data. AFAIK Dovecot would have to offer a subagent, which could be queried by a SNMP server.
If we need more knowledge on SNMP I can ask my folks on the team to give some guidance. For the moment I found this: http://net-snmp.sourceforge.net/wiki/index.php/TUT:Writing_a_Subagent
Here are the stats we believe to be useful:
Login/Logout
- total number login success/time
- total number login failure/time ..
I'll look at these later in more detail, but some important questions / design decisions:
Currently stats process only remembers things after Dovecot was started. I don't think getting these kind of numbers would really work like that. Perhaps all of the statistics should be permanently dumped to disk every ~minute or so + at shutdown and loaded at startup, so the numbers would at least normally always just increase since the first time Dovecot was started?
ACK. My understanding is: Statistical data are moments in time. The application provides these snapshots. It is up to other protocols (e.g. SNMP) and software (e.g. RRD) to gather and create time series and also to relate data to each other in order to come up with ratios, timelines etc.
This might be a good opportunity to check out Howard's MDB database (in order to get around potential future law suits concerning BDB usage ...). http://highlandsun.com/hyc/mdb/
Mailbox state
- Inflow rate (number incoming messages/time)
- Deleted rate (number \Deleted flagged messages/time)
These operations/time type of things I had hoped to be able to externalize :) If stats process simply gives the raw stats, the reader could do this kind of summing up. Otherwise .. well, I guess it could maybe keep track of the current ops/
and the reader would then have to read the value about once a minute or half or something. It wouldn't give exact results though.
ACK. I'd externalize them too. So dump the /time aspect and only give raw data at moment of query.
Performance
- minimum time to write a message
- maximum time to write a message
- average time to write a message
Within last .. day? hour? minute? ..
Concerning "message write time": the time the last message had to be written.
In general the stats update interval should be configurable in order to adapt it to the overall system performance. Makes no sense to bring down the server by gathering stats every nano second unless one likes self-induced DOS. ;)
It would probably be a useful strategy to update internal data on every event and answer SNMP queries from memory but write the data to disc every once in a while to have them when the server restarts. Besides that I don't see a use case for sharing such data between processes such as exporting them to memcache or anything alike. Do you?
p@rick
-- state of mind ()
Franziskanerstraße 15 Telefon +49 89 3090 4664 81669 München Telefax +49 89 3090 4666
Amtsgericht München Partnerschaftsregister PR 563
On 01.06.2012 22:58, Patrick Ben Koetter wrote:
[...] I sat down and put together a list of stat items we think to be useful in daily dovecot usage.
Quite a list. But I believe most of those values are quite useful and I would also love to see such a rich set of measurements being available.
Besides pulling together all the data we also think it would be useful to have an SNMP interface to access the stats. Our offer to create and contribute a standalone web interface for dovecot stats stands.
Yes, I second that. Otherwise quite a few installation will just hook the dovecot commands to netsnmp handlers, which is not a pretty solution.
Maybe dovecot could also do the SNMP for statistics that plugins provide? I'm thinking managesieve access, sieve processing or expire here.
Regards
Christian
Patrick Ben Koetter wrote:
following our discussion on dovecot stats at the LinuxTag 2012 my team and I sat down and put together a list of stat items we think to be useful in daily dovecot usage.
Besides pulling together all the data we also think it would be useful to have an SNMP interface to access the stats. Our offer to create and contribute a standalone web interface for dovecot stats stands.
This should be done via SNMP subagent, but how could you differentiate different dovecot instances on the same machine, different snmp ports for the subagent, or different snmp trees?
Here are the stats we believe to be useful: [...]
Here are the stats which I also consider to be useful:
Login/Logout:
- Hits/Misses for Logins via userdb cache
System resources:
- detailed memory usage of dovecot services (imap, worker, userdb cache)
- dovecot connections to mysql database
- dovecot connections to ldap
- director connections vs. backend connections
Regards, Daniel
On 06/02/12 17:10, Daniel Parthey wrote:
Patrick Ben Koetter wrote:
following our discussion on dovecot stats at the LinuxTag 2012 my team and I sat down and put together a list of stat items we think to be useful in daily dovecot usage.
Besides pulling together all the data we also think it would be useful to have an SNMP interface to access the stats. Our offer to create and contribute a standalone web interface for dovecot stats stands.
This should be done via SNMP subagent, but how could you differentiate different dovecot instances on the same machine, different snmp ports for the subagent, or different snmp trees?
I'd suggest some additional performance metrics like min/max/avg time to authenicate, establish a proxy session and perhaps include auth failure causes counters as well.
I personally wouldn't want to see this implemented as an SNMP subagent but so long as the stats would be available off a local socket directly I think everyone would be happy.
-K
participants (5)
-
Christian Rohmann
-
Daniel Parthey
-
Kelsey Cummings
-
Patrick Ben Koetter
-
Timo Sirainen