Mass Stripping Attachments by Directory, Age, Size
Hi,
I've been looking around for a solution to this problem. I want to prune down the attachments on a server before a migration. Some of the emails are 7 years old and have 40Mb attachments, so this seems like a good opportunity to rationalize things. So perhaps I'd like to "Remove all attachments from emails older than 2 years, in the .Sent directory", or "Attachments over 10Mb anywhere in the mail tree"
I've found the strip_attachments.pl script here https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it.
I've looked at ripmime and mpack/munpack, and although they seem like useful tools to do the job of deconstructing the mail into its constituent parts, it doesn't seem to help in re-building the email. I think they could be used with a bit of study into mail MIME structure, and used with a helper script.
So before I take a deep dive into scripting my own solution, I just wanted to check if anyone else on the list has been through this and has some resources or pointers they can share, or maybe even someone to tell me "Duh, you can do it with doveadm of course".
P.
I would like such a feature too, but instead of deleting the atatchment files, I would like to „detach“ the files and save them into a sperate directory, which could be on a different storage like a share in the users home directory or even S3 and then replace the attachment in the Mail with a LINK to that file. Thunderbird does this quite well with its „Detach Attachment“ feature; the MIME part looks like this after that:
———————————————————————————————————————— Content-Type: image/png; name="funny-picture.png" Content-Disposition: attachment; filename="funny-picture.png" X-Mozilla-External-Attachment-URL: file://///fileserver/home/svarco/mail/attachments/funny-picture.png X-Mozilla-Altered: AttachmentDetached; date="Thu Mar 18 09:44:37 2021"
You deleted an attachment from this message. The original MIME headers for the attachment were: Content-Transfer-Encoding: base64 Content-Disposition: inline; filename=funny-picture.png Content-Type: image/png; name="funny-picture.png" ————————————————————————————————————————
I know that for MS Exchange / Outlook some external archiving solutions as components do exist and looking for something similar to offload attachments with dovecot. :)
Steven
Am 18.03.2021 um 08:31 schrieb Plutocrat plutocrat@gmail.com:
Hi,
I've been looking around for a solution to this problem. I want to prune down the attachments on a server before a migration. Some of the emails are 7 years old and have 40Mb attachments, so this seems like a good opportunity to rationalize things. So perhaps I'd like to "Remove all attachments from emails older than 2 years, in the .Sent directory", or "Attachments over 10Mb anywhere in the mail tree"
I've found the strip_attachments.pl script here https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it.
I've looked at ripmime and mpack/munpack, and although they seem like useful tools to do the job of deconstructing the mail into its constituent parts, it doesn't seem to help in re-building the email. I think they could be used with a bit of study into mail MIME structure, and used with a helper script.
So before I take a deep dive into scripting my own solution, I just wanted to check if anyone else on the list has been through this and has some resources or pointers they can share, or maybe even someone to tell me "Duh, you can do it with doveadm of course".
P.
I would like such a feature too, but instead of deleting the atatchment files, I would like to „detach“ the files and save them into a sperate
Thunderbird does this quite well with its „Detach Attachment“ feature;
On 18/03/2021 16.52, Steven Varco wrote: directory, which could be on a different storage like a share in the users home directory or even S3 and then replace the attachment in the Mail with a LINK to that file. the MIME part looks like this after that:
I'm familiar with the Thunderbird implementation. I'd like it if the attachment name was preserved in there too. Saving it to a directory would be nice, but not require for my needs.
I know that for MS Exchange / Outlook some external archiving solutions as components do exist and looking for something similar to offload attachments with dovecot. :)
I forgot to mention before, the ImapSize utility, which will help for single accounts, for which the login and password are known. https://broobles.com/imapsize/
But what I'm really looking for is something that I can script on a server. I'll let you know what I come up with.
P.
On Thu, Mar 18, 2021 at 4:53 PM Steven Varco dovecot.org@bbs.varco.ch wrote:
I would like such a feature too, but instead of deleting the atatchment files, I would like to „detach“ the files and save them into a sperate directory, which could be on a different storage like a share in the users home directory or even S3 and then replace the attachment in the Mail with a LINK to that file. Thunderbird does this quite well with its „Detach Attachment“ feature; the MIME part looks like this after that:
———————————————————————————————————————— Content-Type: image/png; name="funny-picture.png" Content-Disposition: attachment; filename="funny-picture.png" X-Mozilla-External-Attachment-URL: file://///fileserver/home/svarco/mail/attachments/funny-picture.png X-Mozilla-Altered: AttachmentDetached; date="Thu Mar 18 09:44:37 2021"
You deleted an attachment from this message. The original MIME headers for the attachment were: Content-Transfer-Encoding: base64 Content-Disposition: inline; filename=funny-picture.png Content-Type: image/png; name="funny-picture.png" ————————————————————————————————————————
I know that for MS Exchange / Outlook some external archiving solutions as components do exist and looking for something similar to offload attachments with dovecot. :)
Steven
Am 18.03.2021 um 08:31 schrieb Plutocrat plutocrat@gmail.com:
Hi,
I've been looking around for a solution to this problem. I want to prune down the attachments on a server before a migration. Some of the emails are 7 years old and have 40Mb attachments, so this seems like a good opportunity to rationalize things. So perhaps I'd like to "Remove all attachments from emails older than 2 years, in the .Sent directory", or "Attachments over 10Mb anywhere in the mail tree"
I've found the strip_attachments.pl script here < https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl> which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it.
I've looked at ripmime and mpack/munpack, and although they seem like useful tools to do the job of deconstructing the mail into its constituent parts, it doesn't seem to help in re-building the email. I think they could be used with a bit of study into mail MIME structure, and used with a helper script.
So before I take a deep dive into scripting my own solution, I just wanted to check if anyone else on the list has been through this and has some resources or pointers they can share, or maybe even someone to tell me "Duh, you can do it with doveadm of course".
P.
On Thu, 18 Mar 2021, Plutocrat wrote:
I've been looking around for a solution to this problem. I want to prune down the attachments on a server before a migration. Some of the emails are 7 years old and have 40Mb attachments, so this seems like a good opportunity to rationalize things. So perhaps I'd like to "Remove all attachments from emails older than 2 years, in the .Sent directory", or "Attachments over 10Mb anywhere in the mail tree"
I've found the strip_attachments.pl script here https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it.
It you have anything that works on mbox, it will probably work on Maildir as each file can be considered a single message mbox. You can combine the script with
find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \;
The ... can be replaced with more file tests (like minimum size or age or only within */cur/) to cut down on processing.
I wrote a gawk script to slim down a multi-Gb Outlook mbox for a user, but it wasn't really complicated, just matching for /^Content-Transfer-Encoding:.*base64/i header (virtually all bulky data will be encoded this way), buffering the base64 data part, then outputting it if it was small, or deleting/replacing/extracting it otherwise.
It was a one-off discarded tool but I can hunt for it if you're hard up.
I've looked at ripmime and mpack/munpack, and although they seem like useful tools to do the job of deconstructing the mail into its constituent parts, it doesn't seem to help in re-building the email. I think they could be used with a bit of study into mail MIME structure, and used with a helper script.
So before I take a deep dive into scripting my own solution, I just wanted to check if anyone else on the list has been through this and has some resources or pointers they can share, or maybe even someone to tell me "Duh, you can do it with doveadm of course".
MIMEDefang may help.
Joseph Tam jtam.home@gmail.com
On 19/03/2021 07.31, Joseph Tam wrote:
I've found the strip_attachments.pl script here < https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl> which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it.
It you have anything that works on mbox, it will probably work on Maildir as each file can be considered a single message mbox. You can combine the script with
find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \;
I thought that too, but my initial test on a single message file didn't work like that. I think I got a zero length file. I'll dig into the code to see if I can figure it out, although my Perl hasn't been used for 20 years or so ...
The ... can be replaced with more file tests (like minimum size or age or only within */cur/) to cut down on processing.
Sure. I'm quite handy with find, sed, awk and all that bash malarkey. I was actually wondering if it could be done with those alone, but it would make more sense to use a library which understands mime already, and does the heavy lifting. This approach might be good as a last resort.
MIMEDefang may help. Nice. Thanks for the pointer.
P.
On Fri, Mar 19, 2021 at 7:31 AM Joseph Tam jtam.home@gmail.com wrote:
On Thu, 18 Mar 2021, Plutocrat wrote:
I've been looking around for a solution to this problem. I want to prune down the attachments on a server before a migration. Some of the emails are 7 years old and have 40Mb attachments, so this seems like a good opportunity to rationalize things. So perhaps I'd like to "Remove all attachments from emails older than 2 years, in the .Sent directory", or "Attachments over 10Mb anywhere in the mail tree"
I've found the strip_attachments.pl script here https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it.
It you have anything that works on mbox, it will probably work on Maildir as each file can be considered a single message mbox. You can combine the script with
find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \;
The ... can be replaced with more file tests (like minimum size or age or only within */cur/) to cut down on processing.
I wrote a gawk script to slim down a multi-Gb Outlook mbox for a user, but it wasn't really complicated, just matching for /^Content-Transfer-Encoding:.*base64/i header (virtually all bulky data will be encoded this way), buffering the base64 data part, then outputting it if it was small, or deleting/replacing/extracting it otherwise.
It was a one-off discarded tool but I can hunt for it if you're hard up.
I've looked at ripmime and mpack/munpack, and although they seem like useful tools to do the job of deconstructing the mail into its constituent parts, it doesn't seem to help in re-building the email. I think they could be used with a bit of study into mail MIME structure, and used with a helper script.
So before I take a deep dive into scripting my own solution, I just wanted to check if anyone else on the list has been through this and has some resources or pointers they can share, or maybe even someone to tell me "Duh, you can do it with doveadm of course".
MIMEDefang may help.
Joseph Tam jtam.home@gmail.com
Still can't find the magic solution to this.
- My PERL isn't good enough to re-purpose strip-attachments.pl so it works on individual emails.
- ripmime works to extract attachments only
- altermime looked good and would delete all attachments from a directory of emails. However it messed up the structure somehow so they wouldn't display in an email client (Thunderbird, Roundcube).
- mimeDEFANG looked possible, but couldn't figure out how to use that as a standalone script.
- PHP solutions including the promising https://github.com/php-mime-mail-parser/php-mime-mail-parser seem only to be able to save attachments from the email, not delete it.
I'll keep going I guess. I can't believe I'm the only person in the world to want to do this though ...
P.
On 19/03/2021 07.31, Joseph Tam wrote:
On Thu, 18 Mar 2021, Plutocrat wrote:
I've been looking around for a solution to this problem. I want to prune down the attachments on a server before a migration. Some of the emails are 7 years old and have 40Mb attachments, so this seems like a good opportunity to rationalize things. So perhaps I'd like to "Remove all attachments from emails older than 2 years, in the .Sent directory", or "Attachments over 10Mb anywhere in the mail tree"
I've found the strip_attachments.pl script here https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it.
It you have anything that works on mbox, it will probably work on Maildir as each file can be considered a single message mbox. You can combine the script with
find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \;
The ... can be replaced with more file tests (like minimum size or age or only within */cur/) to cut down on processing.
I wrote a gawk script to slim down a multi-Gb Outlook mbox for a user, but it wasn't really complicated, just matching for /^Content-Transfer-Encoding:.*base64/i header (virtually all bulky data will be encoded this way), buffering the base64 data part, then outputting it if it was small, or deleting/replacing/extracting it otherwise.
It was a one-off discarded tool but I can hunt for it if you're hard up.
I've looked at ripmime and mpack/munpack, and although they seem like useful tools to do the job of deconstructing the mail into its constituent parts, it doesn't seem to help in re-building the email. I think they could be used with a bit of study into mail MIME structure, and used with a helper script.
So before I take a deep dive into scripting my own solution, I just wanted to check if anyone else on the list has been through this and has some resources or pointers they can share, or maybe even someone to tell me "Duh, you can do it with doveadm of course".
MIMEDefang may help.
Joseph Tam jtam.home@gmail.com
Well ain't that rich? To use an allegory of sorts, we're going to have start using staples rather than paperclips 📎🖇️ with our email attachments, and one unified digital signature on the whole message as sent rather than a separate signature for each enclosure as commonly "done" with PGP, GnuPG, etc.
On March 30, 2021 7:39:02 PM AKDT, Plutocrat plutocrat@gmail.com wrote:
Still can't find the magic solution to this.
- My PERL isn't good enough to re-purpose strip-attachments.pl so it works on individual emails.
- ripmime works to extract attachments only
- altermime looked good and would delete all attachments from a directory of emails. However it messed up the structure somehow so they wouldn't display in an email client (Thunderbird, Roundcube).
- mimeDEFANG looked possible, but couldn't figure out how to use that as a standalone script.
- PHP solutions including the promising https://github.com/php-mime-mail-parser/php-mime-mail-parser seem only to be able to save attachments from the email, not delete it.
I'll keep going I guess. I can't believe I'm the only person in the world to want to do this though ...
P.
On Thu, 18 Mar 2021, Plutocrat wrote:
I've been looking around for a solution to this problem. I want to
I've found the strip_attachments.pl script here
https://fossies.org/linux/Mail-Box/examples/strip-attachments.pl which works fine on mbox (as tested on my local Thunderbird mboxes), but not on maildir which is on the dovecot server. My Perl isn't strong enough to re-purpose it.
It you have anything that works on mbox, it will probably work on Maildir as each file can be considered a single message mbox. You can combine the script with
find ~user/MailDir -type f ... -exec /path/to/mbox-strip {} \;
The ... can be replaced with more file tests (like minimum size or age or only within */cur/) to cut down on processing.
I wrote a gawk script to slim down a multi-Gb Outlook mbox for a user, but it wasn't really complicated, just matching for /^Content-Transfer-Encoding:.*base64/i header (virtually all bulky data will be encoded this way), buffering the base64 data part, then outputting it if it was small, or deleting/replacing/extracting it otherwise.
It was a one-off discarded tool but I can hunt for it if you're hard up.
I've looked at ripmime and mpack/munpack, and although they seem
On 19/03/2021 07.31, Joseph Tam wrote: prune down the attachments on a server before a migration. Some of the emails are 7 years old and have 40Mb attachments, so this seems like a good opportunity to rationalize things. So perhaps I'd like to "Remove all attachments from emails older than 2 years, in the .Sent directory", or "Attachments over 10Mb anywhere in the mail tree" like useful tools to do the job of deconstructing the mail into its constituent parts, it doesn't seem to help in re-building the email. I think they could be used with a bit of study into mail MIME structure, and used with a helper script.
So before I take a deep dive into scripting my own solution, I just
wanted to check if anyone else on the list has been through this and has some resources or pointers they can share, or maybe even someone to tell me "Duh, you can do it with doveadm of course".
MIMEDefang may help.
Joseph Tam jtam.home@gmail.com
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 05/04/2021 07:37 Plutocrat plutocrat@gmail.com wrote:
OK, an update on the progress with this.
I finally settled on a python script which does the stripping based on code here: http://code.activestate.com/recipes/302086-strip-attachments-from-an-email-m...
And then a bash script using find that allows me to select candidate files with 'find' and pass them to the python script, eg.
find $DIR -type f -mtime +$OLDERTHANDAYS -size +$LARGERTHAN ! -name 'dovecot*' After a bit of debugging to do with UTF characters etc, I seem to have got the script working and it will process a directory or entire account without complaining. My coding is not good, but if anyone wants a copy, contact me off list, to spare my blushes.
I'm now experiencing an issue when I go to check the emails, using Thunderbird IMAP. The mails were cached in Thunderbird, and indexed by dovecot on the server. I've been trying to figure out the minimum I need to do to get Thunderbird to pick up the changes.
'doveadm force-resync -u user@domain.com INBOX' seemed like an option, but didn't actually seem to do much.
deleting all the dovecot.* files in the user directory on the server, seemed like a harsher option, but again didn't really fix things.
On the Thunderbird end, deleting the INBOX.msf file, didn't do anything, and deleting the INBOX and INBOX.msf files, still meant the wrong versions of the mails were coming down with attachments, and then disconnecting when it created an error.
Errors in the logs were Apr 05 12:15:33 imap(user@domain.com) Error: Corrupted record in index cache file /mail/path/dovecot.index.cache: UID 1298: Broken physical size in mailbox INBOX: read(/mail/path/cur/1615880838.M742750P25731.mail.domain.com,S=12893560,W=13061037:2,Se) failed: Cached message size larger than expected (12893560 > 2937, box=INBOX, UID=1298) Apr 05 12:15:33 imap(user@domain.com): Info: FETCH read() failed in=10718 out=7471947 deleted=0 expunged=0 trashed=0 hdr_count=1647 hdr_bytes=645910 body_count=448 body_bytes=6371591 Apr 05 12:15:36 imap(user@domain.com): Error: Corrupted record in index cache file /mail/path/dovecot.index.cache: UID 1298: Broken physical size in mailbox INBOX: read(/mail/path/cur/1615880838.M742750P25731.mail.domain.com,S=12893560,W=13061037:2,Se) failed: Cached message size larger than expected (12893560 > 2937, box=INBOX, UID=1298) It seems the only way to do this is to disconnect, delete all dovecot.* files on the server, delete all Thunderbird cache files on the PC, and then reconnect and wait for them to figure it out. Does that seem correct?
Finally, and relatedly, the maildir files on the server are tagged with a size field eg S=12893560. Is it possible to regenerate them with the new correct file sizes? If I leave them alone, will it affect anything? P.
Hi!
The problems you are facing are due to the fact that IMAP considers mails immutable once they've been stored. They are not supposed to change.
For maildir, the mail filename itself contains things you need to fix if you alter the mails, such as the S(ize) parameter. See this script: https://dovecot.org/tools/maildir-size-fix.pl
If you would want to do this cleanly, you'd reinsert the mails to new/ or cur/ after manipulation as new mails and then they'd get new UIDs. This requires altering the filename slightly, mainly the bit after the timestamp.
See also https://wiki2.dovecot.org/MailboxFormat/Maildir for details how maildir format works.
Aki
participants (5)
-
Aki Tuomi
-
Joseph Tam
-
justina colmena ~biz
-
Plutocrat
-
Steven Varco