I did some more optimisation.
Most performance hit in Dovecot do "for loop" in string operations.
In one case (in message-parser.c) "for loop" has another "for" inside with the same variable used as iterator. This case is very hard to optimise by compiler.
I do changes only in top functions listed in oprofile. Maybe I do more in future.
Code was analysed and tested but it's hard to generate all cases. It's working well in production server (I'm monitoring).
In istream-crlf.c I've changed one thing. When destination buffer is full after '\r' addition (to dest), It didn't skip '\n' in source buffer. I think this was buggy in earlier code, and '\n' was skipped (this piece of code is used very rare).
Please check this out. This can help in huge e-mail systems :P
And another problem. Why You use safe_memset instead of memset? Now this function have the largest impact in Dovecot performance. Another on list is t_push.
Regards, Len7hir