Mass Stripping Attachments by Directory, Age, Size

Aki Tuomi aki.tuomi at open-xchange.com
Mon Apr 5 10:46:42 EEST 2021


> On 05/04/2021 07:37 Plutocrat <plutocrat at gmail.com> wrote:
> 
> 
> OK, an update on the progress with this. 
> 
> I finally settled on a python script which does the stripping based on code here: 
>  http://code.activestate.com/recipes/302086-strip-attachments-from-an-email-message/
> 
> And then a bash script using find that allows me to select candidate files with 'find' and pass them to the python script, eg. 
>  
>  find $DIR -type f -mtime +$OLDERTHANDAYS -size +$LARGERTHAN ! -name 'dovecot*'
> After a bit of debugging to do with UTF characters etc, I seem to have got the script working and it will process a directory or entire account without complaining. My coding is not good, but if anyone wants a copy, contact me off list, to spare my blushes. 
> 
> I'm now experiencing an issue when I go to check the emails, using Thunderbird IMAP. The mails were cached in Thunderbird, and indexed by dovecot on the server. I've been trying to figure out the minimum I need to do to get Thunderbird to pick up the changes. 
> 
>   * 'doveadm force-resync -u user at domain.com INBOX' seemed like an option, but didn't actually seem to do much. 
>     
>   * deleting all the dovecot.* files in the user directory on the server, seemed like a harsher option, but again didn't really fix things. 
>     
>   * On the Thunderbird end, deleting the INBOX.msf file, didn't do anything, and deleting the INBOX and INBOX.msf files, still meant the wrong versions of the mails were coming down with attachments, and then disconnecting when it created an error. 
>     
> Errors in the logs were
> Apr 05 12:15:33 imap(user at domain.com) Error: Corrupted record in index cache file /mail/path/dovecot.index.cache: UID 1298: Broken physical size in mailbox INBOX: read(/mail/path/cur/1615880838.M742750P25731.mail.domain.com,S=12893560,W=13061037:2,Se) failed: Cached message size larger than expected (12893560 > 2937, box=INBOX, UID=1298)
>  Apr 05 12:15:33 imap(user at domain.com): Info: FETCH read() failed in=10718 out=7471947 deleted=0 expunged=0 trashed=0 hdr_count=1647 hdr_bytes=645910 body_count=448 body_bytes=6371591
>  Apr 05 12:15:36 imap(user at domain.com): Error: Corrupted record in index cache file /mail/path/dovecot.index.cache: UID 1298: Broken physical size in mailbox INBOX: read(/mail/path/cur/1615880838.M742750P25731.mail.domain.com,S=12893560,W=13061037:2,Se) failed: Cached message size larger than expected (12893560 > 2937, box=INBOX, UID=1298)
> It seems the only way to do this is to disconnect, delete all dovecot.* files on the server, delete all Thunderbird cache files on the PC, and then reconnect and wait for them to figure it out. Does that seem correct? 
> 
> 
> 
> Finally, and relatedly, the maildir files on the server are tagged with a size field eg S=12893560. Is it possible to regenerate them with the new correct file sizes? 
>  If I leave them alone, will it affect anything?
> P.
>


Hi!

The problems you are facing are due to the fact that IMAP considers mails immutable once they've been stored. They are not supposed to change.

For maildir, the mail filename itself contains things you need to fix if you alter the mails, such as the S(ize) parameter. See this script: https://dovecot.org/tools/maildir-size-fix.pl

If you would want to do this cleanly, you'd reinsert the mails to new/ or cur/ after manipulation as new mails and then they'd get new UIDs. This requires altering the filename slightly, mainly the bit after the timestamp.

See also https://wiki2.dovecot.org/MailboxFormat/Maildir for details how maildir format works.

Aki


More information about the dovecot mailing list