[Dovecot] message-decoder bug for attachments with charset=binary attribute in content-type?

Robert Wolf r.wolf.conf at gmail.com
Mon May 12 07:30:36 UTC 2014


Hello,

I have configure dovecot with solr and I wanted to let solr index content of 
attachments. For testing I have used biabam command line tool to generate 
emails with attachments.

I have found that dovecot with fts_decoder incorrectly decodes these 
attachments from biabam and therefore pdftotext has reported corrupted PDF.

The problem is that biabam generates header with charset=binary and dovecot
message decoder tries to process it as UTF8 or non-UTF8 data.

============================================================
--biabam.ZxWVLybiabam.ZxWVLy
Content-Type: application/pdf; charset=binary
Content-Disposition: attachment; filename="bacula-jobs.pdf"
Content-Transfer-Encoding: base64

JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PC9MZW5ndGggNiAwIFIvRmlsdGVy
IC9GbGF0ZURlY29kZT4+CnN0cmVhbQp4nF2PT0+EMBDF7/0U7yYYWdqFXdbe
1vgnMXpQezMeClSoQNltweh+egvLyczhN3kz703mCLpioFMtLDoSv2aoHKGo
.....
============================================================

This PDF begins orginal with 

============================================================
0000000: 2550 4446 2d31 2e34 0a25 c7ec 8fa2 0a35  %PDF-1.4.%.....5
0000010: 2030 206f 626a 0a3c 3c2f 4c65 6e67 7468   0 obj.<</Length
0000020: 2036 2030 2052 2f46 696c 7465 7220 2f46   6 0 R/Filter /F
0000030: 6c61 7465 4465 636f 6465 3e3e 0a73 7472  lateDecode>>.str
============================================================

But the dovecot pass following data to fts_decoder script:

============================================================
0000000: 2550 4446 2d31 2e34 0a25 c3a4 c3bc c3b6  %PDF-1.4.%......
0000010: c39f 0a32 2030 206f 626a 0a3c 3c2f 4c65  ...2 0 obj.<</Le
0000020: 6e67 7468 2033 2030 2052 2f46 696c 7465  ngth 3 0 R/Filte
0000030: 722f 466c 6174 6544 6563 6f64 653e 3e0a  r/FlateDecode>>.
============================================================

As you can see binary data are mangled.


Alpine and Thunderbird do not write charset=binary to content-type header and
searching works perfect.


I have searched in source code and I have found one place. If I replace the 
following code in file dovecot-2.1.7/src/lib-mail/message-decoder.c on line 241 
with new one, the dovecots message decoder decodes message correctly and 
pdftotext can convert attached PDF.

Original code:
============================================================
241:   ctx->binary_input = ctx->content_charset == NULL && 
242:     (ctx->flags & MESSAGE_DECODER_FLAG_RETURN_BINARY) != 0 &&
243:     (part->flags & (MESSAGE_PART_FLAG_TEXT |
244:         MESSAGE_PART_FLAG_MESSAGE_RFC822)) == 0;
============================================================

My update:
============================================================
241   ctx->binary_input = ((ctx->content_charset != NULL) && (strcasecmp(ctx->content_charset, "binary") == 0)) || (ctx->content_charset == NULL &&
242     (ctx->flags & MESSAGE_DECODER_FLAG_RETURN_BINARY) != 0 &&
243     (part->flags & (MESSAGE_PART_FLAG_TEXT |
244         MESSAGE_PART_FLAG_MESSAGE_RFC822)) == 0);
============================================================

This will set ctx->binary_input for the attachment with charset set to
"binary".

I don't know if this is correct update, but the searching works with this
update for biabam binary attachments too.

Could you please verify this problem and maybe update the code?


Thank you very much. 



# dovecot --version
2.1.7

Config:

plugin {
        fts = solr
        fts_solr = url=http://localhost:8080/solr/
        fts_decoder = decode2text
}

service decode2text {
  executable = script /etc/dovecot/scripts/decode2text.sh
  user = dovecot
  unix_listener decode2text {
    mode = 0666
  }
}



Regards,

Robert Wolf.


More information about the dovecot mailing list