On 19/03/2021 07.31, Joseph Tam wrote:
>> I've found the strip_attachments.pl script here <https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl> which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it.
>
> It you have anything that works on mbox, it will probably work on Maildir
> as each file can be considered a single message mbox.  You can combine
> the script with
>
>      find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \;

I thought that too, but my initial test on a single message file didn't work like that. I think I got a zero length file. I'll dig into the code to see if I can figure it out, although my Perl hasn't been used for 20 years or so ...

> The ... can be replaced with more file tests (like minimum size or age or only within */cur/) to cut down on processing.

Sure. I'm quite handy with find, sed, awk and all that bash malarkey. I was actually wondering if it could be done with those alone, but it would make more sense to use a library which understands mime already, and does the heavy lifting. This approach might be good as a last resort.

> MIMEDefang may help.
Nice. Thanks for the pointer.

P.

On Fri, Mar 19, 2021 at 7:31 AM Joseph Tam <jtam.home@gmail.com> wrote:
On Thu, 18 Mar 2021, Plutocrat wrote:

> I've been looking around for a solution to this problem. I want to prune down
> the attachments on a server before a migration. Some of the emails are 7
> years old and have 40Mb attachments, so this seems like a good opportunity to
> rationalize things. So perhaps I'd like to "Remove all attachments from
> emails older than 2 years, in the .Sent directory", or "Attachments over 10Mb
> anywhere in the mail tree"
>
> I've found the strip_attachments.pl script here
> <https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl> which
> works fine on mbox (as tested on my local Thunderbird mboxes), but not on
> maildir which is on the dovecot server. My Perl isn't strong enough to
> re-purpose it.

It you have anything that works on mbox, it will probably work on Maildir
as each file can be considered a single message mbox.  You can combine
the script with

        find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \;

The ... can be replaced with more file tests (like minimum size or age
or only within */cur/) to cut down on processing.

I wrote a gawk script to slim down a multi-Gb Outlook mbox
for a user, but it wasn't really complicated, just matching for
/^Content-Transfer-Encoding:.*base64/i header (virtually all bulky data
will be encoded this way), buffering the base64 data part, then outputting
it if it was small, or deleting/replacing/extracting it otherwise.

It was a one-off discarded tool but I can hunt for it if you're hard up.

> I've looked at ripmime and mpack/munpack, and although they seem like useful
> tools to do the job of deconstructing the mail into its constituent parts, it
> doesn't seem to help in re-building the email. I think they could be used
> with a bit of study into mail MIME structure, and used with a helper script.
>
> So before I take a deep dive into scripting my own solution, I just wanted to
> check if anyone else on the list has been through this and has some resources
> or pointers they can share, or maybe even someone to tell me "Duh, you can do
> it with doveadm of course".

MIMEDefang may help.

Joseph Tam <jtam.home@gmail.com>