On Thu, 29 Nov 2018, Marc Roos wrote:
When concatenating mbox files like described here https://xaizek.github.io/2013-03-30/merge-mbox-mailboxes/. You will end up with an 'unsorted' mbox file. Is this going to be a problem esspecially when they are large >2GB's and new emails will be written to it?
I don't think it will be a problem, but you might have to remove some headers (like the UUID header?). However, I think dovecot ought to be able to cope with it anyways and regenerate the indices.
The email client nicely sorts the message from folder A "foldera 5 last" as last, but of course the mbox is not like this. Is there a better solution for merging files?
As noted, the time order gets scrambled -- using your mail reader to get it back in time order requires sorting, an intensive operation.
It just so happen I've done this recently with a (GNU) awk script that merges multiple mailboxes into one mailbox, preserving time order. It assumes that each message starst with a From envelopes header with sorted timestamps e.g.
From mickey@disney.com Thu Nov 25 18:45:37 2018
From mickey@disney.com Thu Nov 25 18:45:37 2018 -0400
Your're welcome to use it. There's probably a more elegant way with doveadm/dsync. Using a mail reader to sort the merged mailbox, then drag/drop/copy everything into a final mailbox could also work.
Joseph Tam <jtam.home@gmail.com>
#!/bin/sh # # Merge multiple mbox's into one assuming that each message # starts with /^From .* {year}$/ and they are sorted by time. # # -- Joseph Tam <jtam.home@gmail.com> #
[ x"$*" = x ] && { echo "Usage: $0 mbox-file ..." exit 1 }
gawk -v boxes="$*" </dev/null '
function Tstamp(header) {
# Format: Jan 22 21:00:48 2018 -0700
# 12345678901234567890123456
l = length(header)
spec = (substr(header,l-4,1)=="-")? substr(header,l-25,20) : substr(header,l-19,20)
spec = substr(spec,17,4) " " ym[substr(spec,1,3)] substr(spec,4,3)
" " substr(spec,8,2) " " substr(spec,11,2) " " substr(spec,14,2)
return int(mktime(spec))
}
function DumpMessage(i) {
if (header[i]!="") {
printf("%s\n",header[i])
}
while ((getline x <mbox[i])>0) {
if (x~/^From .*[0-9][0-9][0-9][0-9]$/) {
stamp[i] = Tstamp(x)
header[i] = x
printf("%s => [%d] %d\n",header[i],i,stamp[i]) >"/dev/stderr"
return
}
print x
}
printf("EOF[%d]\n",i) >"/dev/stderr"
stamp[i] = 2147483647
header[i] = ""
}
BEGIN {
ym["Jan"] = "01"; ym["Feb"] = "02"; ym["Mar"] = "03"; ym["Apr"] = "04"
ym["May"] = "05"; ym["Jun"] = "06"; ym["Jul"] = "07"; ym["Aug"] = "08"
ym["Sep"] = "09"; ym["Oct"] = "10"; ym["Nov"] = "11"; ym["Dec"] = "12"
n = split(boxes,mbox," ")
# Read first header line from all boxes
for (i=1; i<=n; i++) {
DumpMessage(i)
}
# Loop until all maiboxes read
while (1) {
t = 2147483646
# Find next message
for (i=1; i<=n; i++) {
if (stamp[i]<=t) {t=stamp[i]; j=i;}
}
# If no more message, quit
if (t==2147483646) exit
# Dump next message from mbox[j]
DumpMessage(j)
}
}'