mdbox vs. maildir format
hey,
i am considering changing my mailbox format from maildir to mdbox. the reason for this is mainly, b/c i have (a) multiple large mailboxes with tens of thousands of mail files, and (b) most of these mails files have a size significantly smaller than the sector size of the disk.
so, since the emails themselves are only about nGB large, the disk space used is at least twice as much, if not even three times.
i know, hard disk space is 'cheap', but still...
but then i read at
https://doc.dovecot.org/admin_manual/mailbox_formats/dbox/
the following:
[...] you must not lose the dbox index files, as they can’t be
regenerated without data loss.
so, raid is mandatory, which is already the case, but what about backup? how can i achieve a backup/snapshot of both, the mdbox (nfs share) and the index files (local raid) and assure they are consistent?
greetings...
i am considering changing my mailbox format from maildir to mdbox. the reason for this is mainly, b/c i have (a) multiple large mailboxes with tens of thousands of mail files, and (b) most of these mails files have a size significantly smaller than the sector size of the disk.
so, since the emails themselves are only about nGB large, the disk space used is at least twice as much, if not even three times.
i know, hard disk space is 'cheap', but still...
Also think about iops. rsyncing 10GB of small files takes a lot longer than 10 1GB files.
but then i read at
https://doc.dovecot.org/admin_manual/mailbox_formats/dbox/
the following:
[...] you must not lose the dbox index files, as they can’t be regenerated without data loss.
I have read this also, and was also worried about this, but when I look at the flat m.988 file, I still have quite a lot of useful data there.
Received: from xxxxx (localhost [127.0.0.1]) by xxxxxxx (8.15.2/8.15.2) with ESMTPS id 29IAeeZw2293734 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT) for <xxxxxxxxxx>; Tue, 18 Oct 2022 12:40:40 +0200 X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.5 at xxxxxxx Received: (from xxxxxx@localhost) by xxxxxxxxxx (8.15.2/8.15.2/Submit) id 29IAeeoF2293733 for xxxxxxx; Tue, 18 Oct 2022 12:40:40 +0200 From: xxxxx <xxxxxxxxxxxxx> Message-ID: 202210181040.29IAeeoF2293733@xxxxxxxxxxxxx Date: Tue, 18 Oct 2022 12:40:40 +0200 To: <xxxxxxx> Subject: test User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Received-SPF: neutral (spf not configured)
test
R634e82ab V7e3 G18424b37ab824e63be3a0000fe361dd3 BINBOX
On 10/18/22 18:46, Marc wrote:
you must not lose the dbox index files, as they can’t be regenerated without data loss. I have read this also, and was also worried about this, but when I look at the flat m.988 file, I still have quite a lot of useful data there.
"Note that with dbox the Index files contain significant data which is held nowhere else. Index files for both sdbox and mdbox contain message flags and keywords. For mdbox, the index file also contains the map_uids which link (via the “map index”) to the actual message data. This data cannot be automatically recreated, so it is important that Index files are treated with the same care as message data files."
"Note that with dbox the Index files contain significant data which is held nowhere else. Index files for both sdbox and mdbox contain message flags and keywords. For mdbox, the index file also contains the map_uids which link (via the “map index”) to the actual message data. This data cannot be automatically recreated, so it is important that Index files are treated with the same care as message data files."
Say you lose the index files. Is it not possible to 'script' the 'm.988' files, so all messages are in mbox format and then convert them back to mdbox? To me the most important part of the email is the sender and what and when he wrote it. I don't even use flags, and I am not sure what keywords is about.
On 18/10/2022 14:09 EEST Marc marc@f1-outsourcing.eu wrote:
"Note that with dbox the Index files contain significant data which is held nowhere else. Index files for both sdbox and mdbox contain message flags and keywords. For mdbox, the index file also contains the map_uids which link (via the “map index”) to the actual message data. This data cannot be automatically recreated, so it is important that Index files are treated with the same care as message data files."
Say you lose the index files. Is it not possible to 'script' the 'm.988' files, so all messages are in mbox format and then convert them back to mdbox? To me the most important part of the email is the sender and what and when he wrote it. I don't even use flags, and I am not sure what keywords is about.
Dovecot can also reconstruct the indexes usually. There is a small chance that this does not work, but usually the map reconstruction works. Try to avoid losing them anyways.
What you WILL lose is flags (like \Seen) and mail keywords (like $RED). Also you will get new UIDVALIDITY making the folder new.
Aki
On 10/18/22 18:17, Michael wrote:
what about backup? how can i achieve a backup/snapshot of both, the mdbox (nfs share) and the index files (local raid) and assure they are consistent?
If you do your backups using doveadm backup, then the result should be consistent, at least in the sense that it would be usable. Your destination can also be set up similarly with separate storage for indexes.
However I'm pretty sure the consistency would be per mailbox ("folder"), so e.g. if a user moved a message from one mailbox to another, you could potentially end up with the message appearing in both mailboxes in the backup.
On 18/10/2022 12:17, Michael wrote:
[...] so, raid is mandatory, which is already the case, but what about backup? how can i achieve a backup/snapshot of both, the mdbox (nfs share) and the index files (local raid) and assure they are consistent?
You can use doveadm to backup the mailboxes, which should work correctly even in a live system.
My backup "strategy" (hopefully it deserves that name) is to weekly run something like:
for MAILBOX in $USERS; do doveadm expunge -u "$MAILBOX" mailbox Trash savedbefore 7d doveadm expunge -u "$MAILBOX" mailbox Spam savedbefore 30d doveadm purge -u "$MAILBOX"
LOCATION2="mdbox:/srv/snap_mail/$MAILBOX/mdbox"
doveadm -v backup -u "$MAILBOX" -P "$LOCATION2"
done
which makes a replica of the mailbox (including dovecot.list.index, but e.g. dovecot-uidvalidity is not there, I don't know if this is bad or not).
Once you have this "snapshot" of the mailbox(es), you can rsync them to wherever you like (so you avoid rsync'ing a changing system).
I've never had to restore from backup (rsync back, and either doveadm backup back in the other direction), but I'd tend to assume this should work fine.
BTW I use mdbox_rotate_interval = 0 and mdbox_rotate_size = 64M, so the mdbox'es have a relatively big size (but not too big), which is nice for rsync.
Cheers, Bernardo
On Tue, 2022-10-18 at 16:48 +0200, Bernardo Reino wrote:
On 18/10/2022 12:17, Michael wrote: > > [...]
so, raid is mandatory, which is already the case, but what about backup? how can i achieve a backup/snapshot of both, the mdbox (nfs share) and the index files (local raid) and assure they are consistent?
You can use doveadm to backup the mailboxes, which should work correctly even in a live system.
My backup "strategy" (hopefully it deserves that name) is to weekly run something like:
for MAILBOX in $USERS; do doveadm expunge -u "$MAILBOX" mailbox Trash savedbefore 7d doveadm expunge -u "$MAILBOX" mailbox Spam savedbefore 30d doveadm purge -u "$MAILBOX"
LOCATION2="mdbox:/srv/snap_mail/$MAILBOX/mdbox" doveadm -v backup -u "$MAILBOX" -P "$LOCATION2" done
Do you think the preceding shellscript will work if I store my Dovecot messages in the Maildir form?
Thanks,
SteveT
On 10/19/22 07:46, Steve Litt wrote:
for MAILBOX in $USERS; do doveadm expunge -u "$MAILBOX" mailbox Trash savedbefore 7d doveadm expunge -u "$MAILBOX" mailbox Spam savedbefore 30d doveadm purge -u "$MAILBOX"
LOCATION2="mdbox:/srv/snap_mail/$MAILBOX/mdbox" doveadm -v backup -u "$MAILBOX" -P "$LOCATION2"
done
Do you think the preceding shellscript will work if I store my Dovecot messages in the Maildir form?
It would, including this part: LOCATION2="mdbox:..." You can use that as a way to convert between storage formats. Or not. Specify what is needed.
On 10/18/2022 7:46 PM, Steve Litt wrote:
On 18/10/2022 12:17, Michael wrote: > > [...]
so, raid is mandatory, which is already the case, but what about backup? how can i achieve a backup/snapshot of both, the mdbox (nfs share) and the index files (local raid) and assure they are consistent? You can use doveadm to backup the mailboxes, which should work correctly even in a live system.
My backup "strategy" (hopefully it deserves that name) is to weekly run something like:
for MAILBOX in $USERS; do doveadm expunge -u "$MAILBOX" mailbox Trash savedbefore 7d doveadm expunge -u "$MAILBOX" mailbox Spam savedbefore 30d doveadm purge -u "$MAILBOX"
LOCATION2="mdbox:/srv/snap_mail/$MAILBOX/mdbox" doveadm -v backup -u "$MAILBOX" -P "$LOCATION2" done Do you think the preceding shellscript will work if I store my Dovecot messages in
On Tue, 2022-10-18 at 16:48 +0200, Bernardo Reino wrote: the Maildir form?
Thanks,
SteveT
Yes it will. The source format is your current format (maildir) and the target format is whatever you specify (mdbox: or maildir:)
I do something similar with my daily backups using dsync. Like others, I was hesitant about using mdbox in the beginning and my solution was to create my point in time backups in maildir format.
for user in $users; do
dsync -u ${user} backup maildir:/home/$user/.mailbkup/mailboxes
done
This is a simplified version of my command. In my backup script this runs inside another loop to make backups for all users in parallel, but I only have about 20 users and plenty of excess CPU on my server. I run this about 4 times per day to sync changes to my backup copy. Once the initial sync is done the incremental changes run pretty quickly.
Doug
participants (7)
-
Aki Tuomi
-
Bernardo Reino
-
Doug
-
Gedalya
-
Marc
-
Michael
-
Steve Litt