[Dovecot] Capability COMPRESS implemented?

Sun Jun 28 13:43:25 EEST 2009

Timo Sirainen wrote:
> On Thu, 2009-06-25 at 23:21 +0100, Ed W wrote:
>   
>>> Yeah, not for next half a year at least. Anyway, it would basically need
>>> istream and ostream implementations for zlib. istream implementation
>>> kind of already exists in zlib plugin, except it's using gz*() functions
>>> instead of doing everything in memory. So:
>>>   
>>>       
>> I might have missed the subtleties since it's a while since I wrote 
>> anything against the gz interface, but there shouldn't be much 
>> difference between interfaces I think?
>>     
>
> I don't know. I've never written anything using the deflate/inflate*()
> interfaces. I just quickly looked up from zlib.h that those are probably
> what's needed.
>
>   

I think what you see as a "stream" is just the API name for a memory 
buffer.  The input output variables point to a struct which is something 
like:

char *buffer_ptr;
long bytes_left_in_buffer;

As you call the function it consumes bytes from the input buffer and may 
optionally squirt some data into the output buffer.  The structs you 
pass are updated to show the new values.  The compress/decompress 
functions return a value which shows if it's finished doing it's thing 
or required more output buffer space, etc

I suppose the only subtlety is that the compressor (and decompressor) 
may keep some bytes in it's internal state (ie unflushed).  So if you 
ask it to compress the string "dovecot" and uncompress the ouput bytes 
you might only get "dove" (say).  The key thing is to call the flush 
function where it's necessary.  However, the unflushed characters are 
those the compressor thinks it can batch with later input, so clearly 
you minimise the amount of flushing when dealing with small input 
strings.  In terms of big picture compression though it's a very small 
decrease in efficiency, but clearly it's desirable to minimise flushes 
where possible (ie only at the end of each command output would be the 
obvious solution)

I don't know the internals of dovecot too well, but I would have thought 
that you would add this the network output abstraction.  So you 
presumably already buffer and spool command output to the network 
socket, now you simply run the output through gzip before each write and 
after each read.  Note there is some potential efficiency gains in 
compressing attachments slightly differently to other data, hence the 
compressor might potentially gain by being nearer the code which is 
generating network output (the decompressor on input data can clearly be 
right in the network input code) but my opinion is that this is barely 
relevant for real users with sensible size emails (the zlib dictionary 
sizes are just too small to get massive compression ratios)

Hopefully this is a fairly easy thing to insert into the current code path?

Cheers

Ed W