[Dovecot] Doubts about dsync, mdbox, SIS
I've been running continous dsync backups of our Maildirs for a few weeks now, with the destination dsync server using mdbox and SIS. The idea was that the destination server would act as a warm copy of all our active users data.
The active servers are using Maildir, and has:
$ df -h /usr/local/atmail/users/
Filesystem Size Used Avail Use% Mounted on
/dev/atmailusers 14T 12T 2.2T 85% /usr/local/atmail/users
$ df -hi /usr/local/atmail/users/
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/atmailusers 145M 113M 33M 78% /usr/local/atmail/users
very little of this is compressed (zlib plugin enabled during christmas).
I'm surprised that the destination server is so large, was expecting zlib and mdbox and SIS would compress it down to much less than what we're seeing (12TB -> 5TB):
$ df -h /srv/mailbackup
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/mailbackupvg-mailbackuplv
5.7T 4.8T 882G 85% /srv/mailbackup
Lots and lots of the attachement storage is duplicated into identical files, instead of hard linked.
When running "doveadm purge -u $user", we're seeing lots of
Error: unlink(/srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-057274283bb51f4f917e0000bf34f6ab) failed: No such file or directory
"/srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-057274283bb51f4f917e0000bf34f6ab" is missing, but there are 205 other copies of this file named /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-* with identical sha1sum.
Also we see corrupted indexes during the purge. This makes me quite uncertain if dsync is a workable backup solution.. or if we can trust mdboxes.
Also on the source side, during dsync, we see too many problems. Some samples:
Error: Mailboxes don't have unique GUIDs: 08b46439069d3d4db0490000e671bf84 is shared by INBOX and INBOX
Error: command BOX-LIST failed
Error: Worker server's mailbox iteration failed
Error: read() from worker server failed: EOF
Error: Failed to sync mailbox INBOX.ferie 2006.: Invalid mailbox name
Error: read() from proxy client failed: EOF
Error: Unexpected finish reply: 1 596fec275888dbd89f6d1f5356c22db6 3720 0 \dsync-expunged 0
Error: Unexpected reply from server: 1 12200572a70726fca946da6f9378dc03 3721 0 \dsync-expunged 0
Error: Failed to sync mailbox INBOX.INBOX.Gerda: Mailbox doesn't exist: INBOX/Gerda
Error: command BOX-LIST failed
Error: read() failed: Broken pipe
Panic: file dsync-worker-local.c: line 1678 (local_worker_save_msg_continue): assertion failed: (ret == -1)
Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0 [0x367703c680] -> /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x35) [0x367703c765] -> /usr/lib64/dovecot/libdovecot.so.0 [0x367703bb93] -> /usr/bin/dsync [0x40f48d] -> /usr/bin/dsync [0x40f589] -> /usr/bin/dsync(dsync_worker_msg_save+0x8e) [0x40eb3e] -> /usr/bin/dsync [0x40d71a] -> /usr/bin/dsync [0x40cdbf] -> /usr/bin/dsync [0x40d105] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x48) [0x3677047278] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xd5) [0x36770485c5] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x2d) [0x367704720d] -> /usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13) [0x3677035a83] -> /usr/bin/dsync(main+0x71e) [0x406c4e] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e3941d994] -> /usr/bin/dsync [0x406369]
Do you have any idea for what our problems might be? Should we:
avoid SIS ?
avoid doing Maildir on one side and mdbox on the other?
try other dovecot version for dsync?
anything else?
-jf
------------- destination server, running dovecot v2.0.14 -------- mail_attachment_dir = /srv/mailbackup/attachments mail_location = mdbox:~/mdbox mail_plugins = zlib mdbox_rotate_size = 5 M namespace { inbox = yes location = prefix = INBOX. separator = . type = private } passdb { driver = static } plugin { zlib_save = gz zlib_save_level = 9 } protocols = service auth-worker { user = $default_internal_user } service auth { unix_listener auth-userdb { mode = 0600 user = mailbackup } } ssl = no userdb { args = home=/srv/mailbackup/%256Hu/%d/%n driver = static } -------------/destination server --------
-jf
Il 01/02/2012 13:29, Jan-Frode Myklebust ha scritto:
I've been running continous dsync backups of our Maildirs for a few weeks now, with the destination dsync server using mdbox and SIS. The idea was that the destination server would act as a warm copy of all our active users data.
How many users there are in this installation?
The active servers are using Maildir, and has:
$ df -h /usr/local/atmail/users/ Filesystem Size Used Avail Use% Mounted on /dev/atmailusers 14T 12T 2.2T 85% /usr/local/atmail/users $ df -hi /usr/local/atmail/users/ Filesystem Inodes IUsed IFree IUse% Mounted on /dev/atmailusers 145M 113M 33M 78% /usr/local/atmail/users
very little of this is compressed (zlib plugin enabled during christmas).
This is the old storage in Maildir format?
I'm surprised that the destination server is so large, was expecting zlib and mdbox and SIS would compress it down to much less than what we're seeing (12TB -> 5TB):
$ df -h /srv/mailbackup Filesystem Size Used Avail Use% Mounted on /dev/mapper/mailbackupvg-mailbackuplv 5.7T 4.8T 882G 85% /srv/mailbackup
This is the new storage in mdbox format?
What size you would expect?
Alessio Cecchi is: @ ILS -> http://www.linux.it/~alessice/ on LinkedIn -> http://www.linkedin.com/in/alessice Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz/ @ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it @ LOLUG -> Socio http://www.lolug.net
On Thu, Feb 02, 2012 at 08:46:55AM +0100, Alessio Cecchi wrote:
How many users there are in this installation?
Quite a few :-) This is for an ISP.
The active servers are using Maildir, and has:
$ df -h /usr/local/atmail/users/ Filesystem Size Used Avail Use% Mounted on /dev/atmailusers 14T 12T 2.2T 85% /usr/local/atmail/users $ df -hi /usr/local/atmail/users/ Filesystem Inodes IUsed IFree IUse% Mounted on /dev/atmailusers 145M 113M 33M 78% /usr/local/atmail/users
very little of this is compressed (zlib plugin enabled during christmas).
This is the old storage in Maildir format?
Correct.
I'm surprised that the destination server is so large, was expecting zlib and mdbox and SIS would compress it down to much less than what we're seeing (12TB -> 5TB):
$ df -h /srv/mailbackup Filesystem Size Used Avail Use% Mounted on /dev/mapper/mailbackupvg-mailbackuplv 5.7T 4.8T 882G 85% /srv/mailbackup
This is the new storage in mdbox format?
Correct.
What size you would expect?
With Maildir I see message-files shrink to about 20%* of original size after turning on zlib with zlib_save_level=6. I was expecting better compression with mdbox (and zlib_save_level=9), and I would expect SIS to help even further.
mdbox+SIS+zlib_save_level=9 variant taking up 40% the space of a mixed** compressed/non-compressed Maildir storage isn't very impressive to me -- and the mdbox backup isn't even complete (it's only the 25% most active users).
Yes, I see there might be holes in my logic, expecting compressed messages to compress further after move to mdbox. But also I have expectation that most of the messages are not already compressed on the Maildir side. Sorry, expectations and guesses, not hard facts.
[*] based on a couple of samples, not thourough research [**] Only messages saved after we enabled zlib on December 25. are compressed.
-jf
On Wed, 2012-02-01 at 13:29 +0100, Jan-Frode Myklebust wrote:
I'm surprised that the destination server is so large, was expecting zlib and mdbox and SIS would compress it down to much less than what we're seeing (12TB -> 5TB):
Note that with SIS the attachments aren't compressed.
Lots and lots of the attachement storage is duplicated into identical files, instead of hard linked.
Something's wrong then.
When running "doveadm purge -u $user", we're seeing lots of
Error: unlink(/srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-057274283bb51f4f917e0000bf34f6ab) failed: No such file or directory
Something's wrong.
"/srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-057274283bb51f4f917e0000bf34f6ab" is missing, but there are 205 other copies of this file named /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-* with identical sha1sum.
All of them have a link count of 2, with the other link being in hashes/ directory?
Also on the source side, during dsync, we see too many problems.
That is most likely related to your troubles. If the dsync runs crash, the result could leave extra files lying around etc..
Some samples:
Error: Mailboxes don't have unique GUIDs: 08b46439069d3d4db0490000e671bf84 is shared by INBOX and INBOX
This is a little bit strange. What is the doveconf -n output of the source server?
Error: Failed to sync mailbox INBOX.ferie 2006.: Invalid mailbox name
Is this a namespace prefix? It shouldn't be trying to sync a mailbox named this (there's an extra "." suffix).
Error: read() from proxy client failed: EOF
I guess the remote dsync crashes or otherwise aborted.
Error: Failed to sync mailbox INBOX.INBOX.Gerda: Mailbox doesn't exist: INBOX/Gerda
I guess some kind of mismatch related to namespace configuration.
Error: read() failed: Broken pipe Panic: file dsync-worker-local.c: line 1678 (local_worker_save_msg_continue): assertion failed: (ret == -1)
Probably can't handle properly when remote dsync dies. Of course it still shouldn't crash. There seems to be some bugs left when dsyncing to a remote host (instead of locally).
It would help if I could reproduce the errors that you're seeing. Can you easily reproduce them with some accounts? If so, if you can give enough details for me to reproduce the problems I can fix them. (Except for the "file not found" issues, since that problems occurred earlier already. I should probably somehow make Dovecot fix those missing files though..)
On Thu, Feb 02, 2012 at 12:23:01PM +0200, Timo Sirainen wrote:
Note that with SIS the attachments aren't compressed.
Yes, I know.
"/srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-057274283bb51f4f917e0000bf34f6ab" is missing, but there are 205 other copies of this file named /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-* with identical sha1sum.
All of them have a link count of 2, with the other link being in hashes/ directory?
No, these has link count=207. I don't know what you mean by link being in hashes directory.
# ls -l /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-*|head -rw------- 207 mailbackup mailbackup 149265 Jan 9 23:31 /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-0069222e0c080f4f754a0000bf34f6ab -rw------- 207 mailbackup mailbackup 149265 Jan 9 23:31 /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-00ffb9312a370e4f6b610000bf34f6ab -rw------- 207 mailbackup mailbackup 149265 Jan 9 23:31 /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-0442c5163ad3114fb4780000bf34f6ab -rw------- 207 mailbackup mailbackup 149265 Jan 9 23:31 /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-04f288390052144f012d0000bf34f6ab -rw------- 207 mailbackup mailbackup 149265 Jan 9 23:31 /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-053b6c0f185a0d4fc4210000bf34f6ab -rw------- 207 mailbackup mailbackup 149265 Jan 9 23:31 /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-06c98213c3b30e4fac3c0000bf34f6ab -rw------- 207 mailbackup mailbackup 149265 Jan 9 23:31 /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-076573234fbd0b4fa8620000bf34f6ab
This is just one example, I can provide tons of other examples.. Hmm, I see now that there are 206 files of that first example with the 207 links, and here's another other example with numlinks=7:
# ls -l /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-*|wc -l
206
and numlinks=4:
# ls -al /srv/mailbackup/attachments/c3/1b/c31beb42ef78810f7fb81a7086144034fb0fd794*|wc -l
3
is dovecot somehow creating numlinks+1 copies of every file it hardlinks?? Would explain my diskusage :-)
That is most likely related to your troubles. If the dsync runs crash, the result could leave extra files lying around etc..
If dsync backup is supposed to be a viable backup solution, I think it should fail much better. If it see errors on the target side it should clear the target and do a full sync. Manually cleaning up after it's problems is too much work.
Some samples:
Error: Mailboxes don't have unique GUIDs: 08b46439069d3d4db0490000e671bf84 is shared by INBOX and INBOX
This is a little bit strange. What is the doveconf -n output of the source server?
# 2.0.14: /etc/dovecot/dovecot.conf # OS: Linux 2.6.18-194.26.1.el5 x86_64 Red Hat Enterprise Linux Server # release 5.5 (Tikanga) auth_cache_size = 100 M auth_verbose = yes auth_verbose_passwords = sha1 disable_plaintext_auth = no login_trusted_networks = 192.168.0.0/16 mail_gid = 3000 mail_home = /srv/mailstore/%256RHu/%d/%n mail_location = maildir:~/:INDEX=/indexes/%1u/%1.1u/%u mail_max_userip_connections = 20 mail_plugins = quota zlib mail_uid = 3000 maildir_stat_dirs = yes maildir_very_dirty_syncs = yes managesieve_notify_capability = mailto managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date mmap_disable = yes namespace { inbox = yes location = prefix = INBOX. separator = . type = private } passdb { args = /etc/dovecot/dovecot-ldap.conf.ext driver = ldap } plugin { quota = dict:UserQuota::file:%h/dovecot-quota sieve = /sieve/%1u/%1.1u/%u/.dovecot.sieve sieve_dir = /sieve/%1u/%1.1u/%u sieve_max_script_size = 1M zlib_save = gz zlib_save_level = 6 } postmaster_address = postmaster@example.net protocols = imap pop3 lmtp sieve service auth-worker { user = $default_internal_user } service auth { client_limit = 4521 unix_listener auth-userdb { group = mode = 0600 user = atmail } } service imap-login { inet_listener imap { address = * port = 143 } process_min_avail = 4 service_count = 0 vsz_limit = 1 G } service imap-postlogin { executable = script-login /usr/local/sbin/imap-postlogin.sh } service imap { executable = imap imap-postlogin process_limit = 2048 } service lmtp { client_limit = 1 inet_listener lmtp { address = * port = 24 } process_limit = 25 } service managesieve-login { inet_listener sieve { address = * port = 4190 } service_count = 1 } service pop3-login { inet_listener pop3 { address = * port = 110 } process_min_avail = 4 service_count = 0 vsz_limit = 1 G } service pop3-postlogin { executable = script-login /usr/local/sbin/pop3-postlogin.sh } service pop3 { executable = pop3 pop3-postlogin process_limit = 2048 } ssl = no userdb { args = /etc/dovecot/dovecot-ldap.conf.ext driver = ldap } protocol lmtp { mail_plugins = quota zlib sieve } protocol imap { imap_client_workarounds = delay-newmail mail_plugins = quota zlib imap_quota } protocol pop3 { mail_plugins = quota zlib pop3_client_workarounds = outlook-no-nuls oe-ns-eoh pop3_uidl_format = UID%u-%v } protocol sieve { managesieve_logout_format = bytes=%i/%o }
Error: Failed to sync mailbox INBOX.ferie 2006.: Invalid mailbox name
Is this a namespace prefix? It shouldn't be trying to sync a mailbox named this (there's an extra "." suffix).
I believe it's a folder named "INBOX.ferie 2006.", with the user using the namespace separator in the folder name.. I believe dovecot allows this, so it should also handle backing it up.
Error: Failed to sync mailbox INBOX.INBOX.Gerda: Mailbox doesn't exist: INBOX/Gerda
I guess some kind of mismatch related to namespace configuration.
They both have same namespace config I think, but Maildir vs. mdbox maybe changes something..
It would help if I could reproduce the errors that you're seeing. Can you easily reproduce them with some accounts? If so, if you can give enough details for me to reproduce the problems I can fix them. (Except for the "file not found" issues, since that problems occurred earlier already. I should probably somehow make Dovecot fix those missing files though..)
I'll look trough the logs to see if there are any errors that keep repeating for the same accounts.
-jf
On Thu, Feb 02, 2012 at 12:31:20PM +0100, Jan-Frode Myklebust wrote:
and numlinks=4:
# ls -al /srv/mailbackup/attachments/c3/1b/c31beb42ef78810f7fb81a7086144034fb0fd794*|wc -l 3
is dovecot somehow creating numlinks+1 copies of every file it hardlinks?? Would explain my diskusage :-)
Sorry, brainfart.. Yes, these are hardlinks to the same inode..
# ls -i c31beb42ef78810f7fb81a7086144034fb0fd794* ../c31beb42ef78810f7fb81a7086144034fb0fd794*
2422693 c31beb42ef78810f7fb81a7086144034fb0fd794
2422693 ../c31beb42ef78810f7fb81a7086144034fb0fd794-13b405342e24284f61530000bf34f6ab
2422693 ../c31beb42ef78810f7fb81a7086144034fb0fd794-1cb405342e24284f61530000bf34f6ab
2422693 ../c31beb42ef78810f7fb81a7086144034fb0fd794-4eb405342e24284f61530000bf34f6ab
-jf
On 2.2.2012, at 13.31, Jan-Frode Myklebust wrote:
"/srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-057274283bb51f4f917e0000bf34f6ab" is missing, but there are 205 other copies of this file named /srv/mailbackup/attachments/c3/17/c317b32b97688c16859956f11b803e3bba434349-* with identical sha1sum.
All of them have a link count of 2, with the other link being in hashes/ directory?
No, these has link count=207.
OK, so they aren't actual copies, they are links to the same file.
I don't know what you mean by link being in hashes directory.
If you have e.g. aa/bb/aabbccdd-eeee file, there should be a matching aa/bb/hashes/aabbccdd file.
That is most likely related to your troubles. If the dsync runs crash, the result could leave extra files lying around etc..
If dsync backup is supposed to be a viable backup solution, I think it should fail much better. If it see errors on the target side it should clear the target and do a full sync. Manually cleaning up after it's problems is too much work.
Of course. But if no one gives me enough information to reproduce problems, I can't really fix anything. I don't really have time to spend guessing ways to make it break. I've been using dsync to backup my own mails for over a year, with zero problems.
Error: Mailboxes don't have unique GUIDs: 08b46439069d3d4db0490000e671bf84 is shared by INBOX and INBOX
What about:
doveadm mailbox status -u user@domain guid '*'
in source server? in dest server? Does one list show two INBOXes or otherwise duplicate GUIDs? Perhaps this was a bug in v2.0.14..
Error: Failed to sync mailbox INBOX.ferie 2006.: Invalid mailbox name
Is this a namespace prefix? It shouldn't be trying to sync a mailbox named this (there's an extra "." suffix).
I believe it's a folder named "INBOX.ferie 2006.", with the user using the namespace separator in the folder name.. I believe dovecot allows this, so it should also handle backing it up.
It has never been possible to create such folder via Dovecot. IMAP protocol itself prevents that. "CREATE foo." will end up creating "foo", not "foo." If you manually mkdir that, it's not possible to access the mailbox in any way via Dovecot. Everything will simply fail as:
a select foo. a NO [CANNOT] Invalid mailbox name
On Thu, Feb 02, 2012 at 02:41:11PM +0200, Timo Sirainen wrote:
That is most likely related to your troubles. If the dsync runs crash, the result could leave extra files lying around etc..
If dsync backup is supposed to be a viable backup solution, I think it should fail much better. If it see errors on the target side it should clear the target and do a full sync. Manually cleaning up after it's problems is too much work.
Of course. But if no one gives me enough information to reproduce problems, I can't really fix anything. I don't really have time to spend guessing ways to make it break. I've been using dsync to backup my own mails for over a year, with zero problems.
I'm reducing the complexity now, removing SIS and starting the backups from scratch again. I'll start posting the problems I see over the weekend..
Error: Mailboxes don't have unique GUIDs: 08b46439069d3d4db0490000e671bf84 is shared by INBOX and INBOX
What about:
doveadm mailbox status -u user@domain guid '*'
in source server?
INBOX guid=08b46439069d3d4db0490000e671bf84
INBOX.Sent guid=e8f6e431bf6e014f2d780000e671bf84
INBOX.Trash guid=c858f2234a1d5d4e1547000058d3d19f
INBOX.Drafts guid=e9f6e431bf6e014f2d780000e671bf84
INBOX.Spam guid=eaf6e431bf6e014f2d780000e671bf84
INBOX.Sent Messages guid=d837512bed7d674e685c000058d3d19f
INBOX.INBOX.Sent Messages guid=ebf6e431bf6e014f2d780000e671bf84
INBOX.Notes guid=c0d2250109645e4eed5c000058d3d19f
in dest server? Does one list show two INBOXes or otherwise duplicate GUIDs? Perhaps this was a bug in v2.0.14..
Scratched dest server before I replied.. sorry.
Error: Failed to sync mailbox INBOX.ferie 2006.: Invalid mailbox name
Is this a namespace prefix? It shouldn't be trying to sync a mailbox named this (there's an extra "." suffix).
I believe it's a folder named "INBOX.ferie 2006.", with the user using the namespace separator in the folder name.. I believe dovecot allows this, so it should also handle backing it up.
It has never been possible to create such folder via Dovecot. IMAP protocol itself prevents that. "CREATE foo." will end up creating "foo", not "foo." If you manually mkdir that, it's not possible to access the mailbox in any way via Dovecot. Everything will simply fail as:
Oh, sorry.. then this is a problem created by @mail, which poked directly in the filesystem. Guess we'll have to clean these up manually.
-jf
participants (3)
-
Alessio Cecchi
-
Jan-Frode Myklebust
-
Timo Sirainen