[Dovecot] Dovecot - some more optimisations
I did some more optimisation.
Most performance hit in Dovecot do "for loop" in string operations.
In one case (in message-parser.c) "for loop" has another "for" inside with the same variable used as iterator. This case is very hard to optimise by compiler.
I do changes only in top functions listed in oprofile. Maybe I do more in future.
Code was analysed and tested but it's hard to generate all cases. It's working well in production server (I'm monitoring).
In istream-crlf.c I've changed one thing. When destination buffer is full after '\r' addition (to dest), It didn't skip '\n' in source buffer. I think this was buggy in earlier code, and '\n' was skipped (this piece of code is used very rare).
Please check this out. This can help in huge e-mail systems :P
And another problem. Why You use safe_memset instead of memset? Now this function have the largest impact in Dovecot performance. Another on list is t_push.
Regards, Len7hir
On Wed, 2010-09-01 at 13:23 +0200, Len7hir wrote:
And another problem. Why You use safe_memset instead of memset? Now this function have the largest impact in Dovecot performance.
Really? I'd think it would be called very rarely since it should be used only for password-related data. I anyway added that at one point when I saw some people talking about how some compilers optimize away memset() when the compiler realizes that the memory isn't used anymore.
It's of course not super-helpful because if attacker can read some memory it's pretty bad already. But it might help to avoid exposing users' passwords in some buggy situations.
Another on list is t_push.
I don't know how much there is to be done about that. Or you could see what happens if you inline the fast path of that code. Or maybe the whole thing could be redesigned in a way to get the fast code path even smaller.
/* count lines and missing cr (from begining + 1) */
last = (unsigned char *)data + 1;
while (TRUE) {
curr = memchr(last, '\n', block->size - (last - data));
if (curr == NULL)
break;
ctx->part->body_size.lines++;
if (*(curr - 1) != '\r')
missing_cr_count++;
last = curr + 1;
}
I did a couple coding style changes here and to other patches to make it more consistent with rest of Dovecot (and also to help me understand the changes with less mental power :)
/* exceptional condition (test begining) */
if (*data == '\n' && ctx->last_chr != '\r') {
ctx->part->body_size.lines++;
missing_cr_count++; }
There's a bug here: lines++ should be done even when last_chr != '\r'.
Also I moved this before the loop. I'd think that optimizes CPU cache behavior better (data is accessed linearly, not data[1] .. data[last], data[0]).
static int parse_next_body_skip_boundary_line(struct message_parser_ctx *ctx, struct message_block *block_r)
This original code was annoyingly complex. Also I think there was a bug in it ("no linefeeds in this block. we can just skip it" block probably shouldn't have been executed when \n was found at the end of the block). After looking at the code long enough I did some other changes besides what you did.
In istream-crlf.c I've changed one thing. When destination buffer is full after '\r' addition (to dest), It didn't skip '\n' in source buffer. I think this was buggy in earlier code, and '\n' was skipped (this piece of code is used very rare).
It looks correct to me .. After buffer is full after '\r', it doesn't skip '\n' because it's not in the buffer. So it comes back there the next time and adds '\n' (but not '\r' again).
if (*(s_ptr - 1) != '\r')
*(d_ptr++) = '\r';
If data[0] == '\n' and also cstream->last_cr=TRUE, this code accesses data[-1] which points to random data.
http://hg.dovecot.org/dovecot-2.0/rev/e275c4f02501 http://hg.dovecot.org/dovecot-2.0/rev/e9358064c45e http://hg.dovecot.org/dovecot-2.0/rev/5163d94d4272
On Wed, 2010-09-01 at 16:25 +0100, Timo Sirainen wrote:
Whops, messed up that one. http://hg.dovecot.org/dovecot-2.0/rev/0c73829cd1f8
W dniu 2010-09-01 17:25, Timo Sirainen pisze:
On Wed, 2010-09-01 at 13:23 +0200, Len7hir wrote:
And another problem. Why You use safe_memset instead of memset? Now this function have the largest impact in Dovecot performance.
Really? I'd think it would be called very rarely since it should be used only for password-related data. I anyway added that at one point when I saw some people talking about how some compilers optimize away memset() when the compiler realizes that the memory isn't used anymore.
It's of course not super-helpful because if attacker can read some memory it's pretty bad already. But it might help to avoid exposing users' passwords in some buggy situations.
OK, I check this more in future. Soon We will have better profiling results and maybe something change.
You are right this is authorisation system.
IMAP is pretty new in our company, but about 800 account use it everyday in different ways and We are expecting much more authorisations soon (after advert).
Safe_memset is also called from: pool_alloconly_clear, pool_alloconly_destroy, pool_system_clean_free and client_destroy (in login-common/client-common.c)
Another on list is t_push.
I don't know how much there is to be done about that. Or you could see what happens if you inline the fast path of that code. Or maybe the whole thing could be redesigned in a way to get the fast code path even smaller.
Maybe We will focus on t_push in future.
/* count lines and missing cr (from begining + 1) */
last = (unsigned char *)data + 1;
while (TRUE) {
curr = memchr(last, '\n', block->size - (last - data));
if (curr == NULL)
break;
ctx->part->body_size.lines++;
if (*(curr - 1) != '\r')
missing_cr_count++;
last = curr + 1;
}
I did a couple coding style changes here and to other patches to make it more consistent with rest of Dovecot (and also to help me understand the changes with less mental power :)
OK. This is no problem for me :)
/* exceptional condition (test begining) */
if (*data == '\n'&& ctx->last_chr != '\r') {
ctx->part->body_size.lines++;
missing_cr_count++; }
There's a bug here: lines++ should be done even when last_chr != '\r'. You are right. Thanks.
Also I moved this before the loop. I'd think that optimizes CPU cache behavior better (data is accessed linearly, not data[1] .. data[last], data[0]).
OK. Your points :)
In istream-crlf.c I've changed one thing. When destination buffer is full after '\r' addition (to dest), It didn't skip '\n' in source buffer. I think this was buggy in earlier code, and '\n' was skipped (this piece of code is used very rare).
It looks correct to me .. After buffer is full after '\r', it doesn't skip '\n' because it's not in the buffer. So it comes back there the next time and adds '\n' (but not '\r' again).
Original code has: if (data[i] == '\n') { (...) if (data[i-1] != '\r') stream->w_buffer[dest++] = '\r'; if (dest == stream->buffer_size) break; stream->w_buffer[dest++] = data[i] is not executed (no space in dest)
i_stream_skip(stream->parent, i); i is set on '\n'
I'm not sure I'm right, but this is possible mistake in original code. Never mind.
if (*(s_ptr - 1) != '\r')
*(d_ptr++) = '\r';
If data[0] == '\n' and also cstream->last_cr=TRUE, this code accesses data[-1] which points to random data.
You are right. (s_ptr < s_end && *s_ptr == '\n' && !cstream->last_cr && d_ptr < d_end)
It could be when !cstream->last_cr is FALSE. Always start from data[1] should fix this. (data[0] is checked at the begining)
Thanks for Your reply You are the best :) Len7hir
On Thu, 2010-09-02 at 11:42 +0200, Len7hir wrote:
Safe_memset is also called from: pool_alloconly_clear, pool_alloconly_destroy,
Only when pool was created by pool_alloconly_create_clean(), which isn't used anywhere.
pool_system_clean_free
Also this isn't used anywhere. I should probably remove these clean-pools since my original idea where to use them was a bad idea after all.
and client_destroy (in login-common/client-common.c)
For clearing the password, yes.
Original code has: if (data[i] == '\n') { (...) if (data[i-1] != '\r') stream->w_buffer[dest++] = '\r'; if (dest == stream->buffer_size) break; stream->w_buffer[dest++] = data[i] is not executed (no space in dest)
i_stream_skip(stream->parent, i); i is set on '\n'
Yes, but lets say i=1 and data[1] == '\n', then i_stream_skip() skips one byte (the data[0]) but not the '\n'.
participants (2)
-
Len7hir
-
Timo Sirainen