aside from cat?
On Thu, Nov 29, 2018 at 03:07:58PM -0800, Joseph Tam wrote:
On Thu, 29 Nov 2018, Marc Roos wrote:
When concatenating mbox files like described here https://xaizek.github.io/2013-03-30/merge-mbox-mailboxes/. You will end up with an 'unsorted' mbox file. Is this going to be a problem esspecially when they are large >2GB's and new emails will be written to it?
I don't think it will be a problem, but you might have to remove some headers (like the UUID header?). However, I think dovecot ought to be able to cope with it anyways and regenerate the indices.
The email client nicely sorts the message from folder A "foldera 5 last" as last, but of course the mbox is not like this. Is there a better solution for merging files?
As noted, the time order gets scrambled -- using your mail reader to get it back in time order requires sorting, an intensive operation.
It just so happen I've done this recently with a (GNU) awk script that merges multiple mailboxes into one mailbox, preserving time order. It assumes that each message starst with a From envelopes header with sorted timestamps e.g.
From mickey@disney.com Thu Nov 25 18:45:37 2018 From mickey@disney.com Thu Nov 25 18:45:37 2018 -0400
Your're welcome to use it. There's probably a more elegant way with doveadm/dsync. Using a mail reader to sort the merged mailbox, then drag/drop/copy everything into a final mailbox could also work.
Joseph Tam jtam.home@gmail.com
#!/bin/sh # # Merge multiple mbox's into one assuming that each message # starts with /^From .* {year}$/ and they are sorted by time. # # -- Joseph Tam jtam.home@gmail.com #
[ x"$*" = x ] && { echo "Usage: $0 mbox-file ..." exit 1 }
gawk -v boxes="$*" " " substr(spec,8,2) " " substr(spec,11,2) " " substr(spec,14,2) return int(mktime(spec))
}
function DumpMessage(i) { if (header[i]!="") { printf("%s\n",header[i]) } while ((getline x
0) { if (x~/^From .*[0-9][0-9][0-9][0-9]$/) { stamp[i] = Tstamp(x) header[i] = x printf("%s => [%d] %d\n",header[i],i,stamp[i]) >"/dev/stderr" return } print x } printf("EOF[%d]\n",i) >"/dev/stderr" stamp[i] = 2147483647 header[i] = ""
}
BEGIN { ym["Jan"] = "01"; ym["Feb"] = "02"; ym["Mar"] = "03"; ym["Apr"] = "04" ym["May"] = "05"; ym["Jun"] = "06"; ym["Jul"] = "07"; ym["Aug"] = "08" ym["Sep"] = "09"; ym["Oct"] = "10"; ym["Nov"] = "11"; ym["Dec"] = "12"
n = split(boxes,mbox," ") # Read first header line from all boxes for (i=1; i<=n; i++) { DumpMessage(i) } # Loop until all maiboxes read while (1) { t = 2147483646 # Find next message for (i=1; i<=n; i++) { if (stamp[i]<=t) {t=stamp[i]; j=i;} } # If no more message, quit if (t==2147483646) exit # Dump next message from mbox[j] DumpMessage(j) }
}'
-- So many immigrant groups have swept through our town that Brooklyn, like Atlantis, reaches mythological proportions in the mind of the world - RI Safir 1998 http://www.mrbrklyn.com
DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002 http://www.nylxs.com - Leadership Development in Free Software http://www2.mrbrklyn.com/resources - Unpublished Archive http://www.coinhangout.com - coins! http://www.brooklyn-living.com
Being so tracked is for FARM ANIMALS and and extermination camps, but incompatible with living as a free human being. -RI Safir 2013