Re: [Dovecot] Capability COMPRESS implemented?

28 Jun 2009


      Timo Sirainen wrote:
...
On Thu, 2009-06-25 at 23:21 +0100, Ed W wrote:
...
...
Yeah, not for next half a year at least. Anyway, it would basically need
istream and ostream implementations for zlib. istream implementation
kind of already exists in zlib plugin, except it's using gz*() functions
instead of doing everything in memory. So:
I might have missed the subtleties since it's a while since I wrote
anything against the gz interface, but there shouldn't be much
difference between interfaces I think?
I don't know. I've never written anything using the deflate/inflate*()
interfaces. I just quickly looked up from zlib.h that those are probably
what's needed.
I think what you see as a "stream" is just the API name for a memory
buffer.  The input output variables point to a struct which is something
like:
char *buffer_ptr;
long bytes_left_in_buffer;
As you call the function it consumes bytes from the input buffer and may
optionally squirt some data into the output buffer.  The structs you
pass are updated to show the new values.  The compress/decompress
functions return a value which shows if it's finished doing it's thing
or required more output buffer space, etc
I suppose the only subtlety is that the compressor (and decompressor)
may keep some bytes in it's internal state (ie unflushed).  So if you
ask it to compress the string "dovecot" and uncompress the ouput bytes
you might only get "dove" (say).  The key thing is to call the flush
function where it's necessary.  However, the unflushed characters are
those the compressor thinks it can batch with later input, so clearly
you minimise the amount of flushing when dealing with small input
strings.  In terms of big picture compression though it's a very small
decrease in efficiency, but clearly it's desirable to minimise flushes
where possible (ie only at the end of each command output would be the
obvious solution)
I don't know the internals of dovecot too well, but I would have thought
that you would add this the network output abstraction.  So you
presumably already buffer and spool command output to the network
socket, now you simply run the output through gzip before each write and
after each read.  Note there is some potential efficiency gains in
compressing attachments slightly differently to other data, hence the
compressor might potentially gain by being nearer the code which is
generating network output (the decompressor on input data can clearly be
right in the network input code) but my opinion is that this is barely
relevant for real users with sensible size emails (the zlib dictionary
sizes are just too small to get massive compression ratios)
Hopefully this is a fairly easy thing to insert into the current code path?
Cheers
Ed W