Timo Sirainen wrote:
On Thu, 2009-06-25 at 23:21 +0100, Ed W wrote:
Yeah, not for next half a year at least. Anyway, it would basically need istream and ostream implementations for zlib. istream implementation kind of already exists in zlib plugin, except it's using gz*() functions instead of doing everything in memory. So:
I might have missed the subtleties since it's a while since I wrote anything against the gz interface, but there shouldn't be much difference between interfaces I think?
I don't know. I've never written anything using the deflate/inflate*() interfaces. I just quickly looked up from zlib.h that those are probably what's needed.
I think what you see as a "stream" is just the API name for a memory buffer. The input output variables point to a struct which is something like:
char *buffer_ptr; long bytes_left_in_buffer;
As you call the function it consumes bytes from the input buffer and may optionally squirt some data into the output buffer. The structs you pass are updated to show the new values. The compress/decompress functions return a value which shows if it's finished doing it's thing or required more output buffer space, etc
I suppose the only subtlety is that the compressor (and decompressor) may keep some bytes in it's internal state (ie unflushed). So if you ask it to compress the string "dovecot" and uncompress the ouput bytes you might only get "dove" (say). The key thing is to call the flush function where it's necessary. However, the unflushed characters are those the compressor thinks it can batch with later input, so clearly you minimise the amount of flushing when dealing with small input strings. In terms of big picture compression though it's a very small decrease in efficiency, but clearly it's desirable to minimise flushes where possible (ie only at the end of each command output would be the obvious solution)
I don't know the internals of dovecot too well, but I would have thought that you would add this the network output abstraction. So you presumably already buffer and spool command output to the network socket, now you simply run the output through gzip before each write and after each read. Note there is some potential efficiency gains in compressing attachments slightly differently to other data, hence the compressor might potentially gain by being nearer the code which is generating network output (the decompressor on input data can clearly be right in the network input code) but my opinion is that this is barely relevant for real users with sensible size emails (the zlib dictionary sizes are just too small to get massive compression ratios)
Hopefully this is a fairly easy thing to insert into the current code path?
Cheers
Ed W