[Dovecot] Attachment extraction, de-duplication (was: Re: Scalability plans: Abstract out filesystem and make it someone else's problem)

Patrick Nagel patrick.nagel at star-group.net
Fri Aug 14 08:04:19 EEST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

On 2009-08-13 20:46, Charles Marcus wrote:
[...]
> Again - for shops that must deal with large binary attachments, this
> would be a god-send.
> 
> Our max allowed message size is 50MB, and we typically get anywhere from
> 2-10 messages a day containing 20, 30, or even 40MB attachments sent to
> our distribution lists - so these would go to 50+ people, who then
> forward them to others, etc, etc ad nauseum.
> 
> Currently, I have mailman set to hold these, then I go in and strip off
> the attachment, put it in a shared location, then let the message (minus
> the attachment) through. But we still have a *lot* of messages like this
> that don't go through our lists, but are sent to 2, 3, or 10 of our reps
> individually.
[...]

I implemented a solution that works well for us, for a couple of months
already. It has one serious limitation though, which will make it unsuitable
for many environments: All mail receivers who are part of the process will be
able to see all attachments of all other mail receivers. So this only works in
a cooperative environment.

In short, a script (implemented as filter, getting called by postfix) extracts
all attachments on arrival, using ripmime [1]. The attachments are then being
moved to a Samba share which all receivers can access. Furthermore, the
original mail gets altered by altermime [2], which inserts a file:/// link to
the attachment(s) at the bottom of the mail and removes the attachment(s) from
the mail. Finally, during the weekend, a file deduplication script (hardlink.py
- - [3]) on the aforementioned Samba server checksums all files in the
attachments directory and hardlinks identical files. So this way we save the
base64-overhead, duplicate attachments sent to multiple persons. Also file
handling on a Samba share is much easier than having to extract attachments via
the MUA first.

Patrick.

[1]: http://www.pldaniels.com/ripmime/
[2]: http://www.pldaniels.com/altermime/
[3]: http://code.google.com/p/hardlinkpy/

- -- 
STAR Software (Shanghai) Co., Ltd.              http://www.star-group.net/
Phone:    +86 (21) 3462 7688 x 826               Fax:   +86 (21) 3462 7779

PGP key:  E883A005 https://stshacom1.star-china.net/keys/patrick_nagel.asc
Fingerprint:             E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAkqE8E4ACgkQ7yMg/OiDoAVBPQCff0fk89IiIxL6hmeedbZC3jes
mNQAniJhbNx0hwNxNYdgXKGr7bXu2zRN
=siTe
-----END PGP SIGNATURE-----


More information about the dovecot mailing list