[Dovecot] Dovecot tuning for GFS2
Hello,
I'm deploing a new email cluster using Dovecot over GFS2. Actually I'm using courier over GFS.
Actually I'm testing Dovecot with these parameters:
mmap_disable = yes mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes lock_method = fcntl
Are they correct?
RedHat GFS support mmap, so is it better to enable it or leave it disabled? The documentation suggest the use of flock. What about it?
Thanks, Andrea
--
Don't talk with a full mouth ... or with an empty head
Ing. *Andrea Gabellini* Email: andrea.gabellini@telecomitalia.sm <mailto:Andrea%20Gabellini%20%3Candrea.gabellini@telecomitalia.sm%3E> Skype: andreagabellini Tel: (+378) 0549 886111 Fax: (+378) 0549 886188
Telecom Italia San Marino S.p.A. Strada degli Angariari, 3 47891 Rovereta Republic of San Marino
Am 21.08.2013 13:57, schrieb Andrea gabellini - SC:
i have
mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes mmap_disable = yes
with ocfs2/maildir
howeveryou you use a cluster filesystem ,if you use loadbalancing additional you should use it with
http://wiki2.dovecot.org/Director
by the way i never tested GFS2 with dovecot myself, but others told me it doesnt work very fine....
Best Regards MfG Robert Schetterer
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein
Robert,
So you are using the same config I'm testing. I forgot to write that I use maildir.
the final design will be, as RedHat suggest, that the same user goes always on the same node (using proxy or director).
Thanks, Andrea
Il 21/08/2013 14:04, Robert Schetterer ha scritto:
--
All men are idiots... I married their king.
Ing. *Andrea Gabellini* Email: andrea.gabellini@telecomitalia.sm <mailto:Andrea%20Gabellini%20%3Candrea.gabellini@telecomitalia.sm%3E> Skype: andreagabellini Tel: (+378) 0549 886111 Fax: (+378) 0549 886188
Telecom Italia San Marino S.p.A. Strada degli Angariari, 3 47891 Rovereta Republic of San Marino
On Wed, Aug 21, 2013 at 02:18:52PM +0200, Andrea gabellini - SC wrote:
So you are using the same config I'm testing. I forgot to write that I use maildir.
I would strongly suggest using mdbox instead. AFAIK clusterfs' aren't very good at handling many small files. It's a worst case random I/O usage pattern, with high rate of metadata operations on top.
We use IBM GPFS for clusterfs, and have finally completed the conversion of a 130+ million inode maildir filesystem, into a 18 million inode mdbox filesystem. I have no hard performance data showing the difference between maildir/mdbox, but at a minimum mdbox is much easier to manage. Backup of 130+ million files is painfull.. and also it feels nice to be able do schedule batches of mailbox purges to off-hours, instead of doing them at peak hours.
As for your settings, we use:
mmap_disable = yes # GPFS also support cluster-wide mmap, but for some reason we've disabled it in dovecot..
mail_fsync = optimized
mail_nfs_storage = no
mail_nfs_index = no
lock_method = fcntl
and of course Dovecot Director in front of them..
-jf
On 8/21/2013 4:07 PM, Jan-Frode Myklebust wrote:
I would strongly suggest using mdbox instead. AFAIK clusterfs' aren't
I'd recommend mdbox as well, with a healthy rotation size. The larger files won't increase IMAP performance substantially but they can make backup significantly quicker.
very good at handling many small files. It's a worst case random I/O usage pattern, with high rate of metadata operations on top.
Just for clarification, small files and random IO patterns at the disks are only a small fraction of the maildir problem. The majority of it is metadata--the create, move, rename, etc operations. To keep the in-memory filesystem state consistent across all nodes, and to avoid putting extra IOPS on the storage if on disk data structures were to be used for synchronization, cluster filesystems exchange all metadata updates and synchronization data over the cluster interconnect. This is inherently slow.
With a local filesystem and multiple processes, this coherence dance takes place at DRAM latencies--tens of nanoseconds, and scales well as load increases because DRAM bandwidth is 25-100 GB/s. With a cluster filesystem it takes place at interconnect latency, tens to hundreds of μs, or about 1000x higher latency. And it doesn't scale well as bandwidth is limited to ~100 MB/s with GbE, ~1 GB/s with 10GbE or Myrinet. Stepping up to Infiniband 4x DDR can get you ~2 GB/s and slightly lower latency, but that's a lot of extra expense for a mail cluster, given the performance won't scale with the $$ spent. The switch and HBAs will cost more than the COTS servers.
Selecting the right mailbox format is in essence free, and mostly solves the maildir metadata and IOPS problem.
130m to 18m is 'only' a 7 fold decrease. 18m inodes is still rather large for any filesystem, cluster or local. A check on an 18m inode XFS filesystem, even on fast storage, would take quite some time. I'm sure it would take quite a bit longer to check a GFS2 with 18m inodes. Any reason you didn't go a little larger with your mdbox rotation size?
-- Stan
On Thu, Aug 22, 2013 at 08:57:40PM -0500, Stan Hoeppner wrote:
We use GPFS, not GFS2. Luckily we've never needed to run fsck on it, but it has support for online fsck so hopefully it would be bareable (but please, lets not talk about such things, knock on wood).
Any reason you didn't go a little larger with your mdbox rotation size?
Just that we didn't see any clear recommendation/documentation for why one would want to switch from the default 2MB. 2 MB should already be packing 50-100 messages/file, so why are we only seeing 7x decrease in number of files.. Hmm, I see the m-files isn't really utilizing 2 MB. Looking at my own mdbox-storage I see 59 m-files, using a total of 34MB (avg. 576KB/file)-- with sizes ranging from ~100 KB to 2 MB. Checking our quarantine mailbox I see 3045 files, using 2.6GB (avg. 850KB/file).
Guess I should look into changing to a larger rotation size.
BTW, what happens if I change the mdbox_rotate_size from 2MB to 10MB? Will all the existing 2MB m-files grow to 10MB, or is it just new m-files that will use this new size? Can I get dovecot to migrate out of the 2MB files, and reorganize to 10MB files ?
-jf
On 8/23/2013 3:30 AM, Jan-Frode Myklebust wrote:
On Thu, Aug 22, 2013 at 08:57:40PM -0500, Stan Hoeppner wrote:
Understood. But it makes little difference. None of the cluster filesystems perform very well with high metadata workloads or extremely high inode counts, whether using OCFS, GFS, GPFS, CXFS, etc.
I'm not that familiar with the GPFS tools. It may be able to run an online check but I'd bet you have to unmount it to do a destructive repair, as with most filesystems, cluster or not.
Apparently 2MB is approximate. I'd guess if a new msg comes in that'll put the m-file over the limit, the file is closed, a new one started, and the new mail goes into the new file, leaving the current (previous) file at less that the rotate size limit. Timo will need to give the definitive answer.
I'd guess existing m-files will remain as they are. The rotation logic acts on the currently open and not yet full file. This is a serial operation, only forward, not back. Again, Timo should have a definitive answer.
-- Stan
On 2013-08-22 9:57 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
I'm considering migrating to mdbox... wondering what you consider 'healthy' rotation size.
I generally try to avoid changing defaults whenever possible, so, do you consider the default size of 2MB too small?
I guess though that it depends on usage. Since we get a decent number of large attachments, maybe that is a good reason to bump it up?
Thanks,
--
Best regards,
*/Charles/*
On 8/23/2013 7:17 AM, Charles Marcus wrote:
It's probably better to err large than to err small. Analyze your current maildir directories and make a distribution graph of file sizes. That should give you a good idea of what your rotation size should be.
-- Stan
On 23.08.2013, at 14:17, Charles Marcus <CMarcus@Media-Brokers.com> wrote:
On 2013-08-22 9:57 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
I am running "mdbox_rotate_size = 100m" for approx. a year now on a small server (a handful of users, only). All mailboxes are around 1G each with a lot of attachments. I never had an issue so far.
Don't ask me why I did chose 100m, I cannot remember ;-) Ok, if one of such mdbox files will become corrupt, I will loose a lot of mail, but on the other hand I am running two dovecot servers in parallel (replicator/dsync) and I do take hourly snapshots (ZFS) of my mail storage file system as well.
Regards, Michael
On Sat, Aug 24, 2013 at 10:47:56AM +0200, Michael Grimm wrote:
How much space are your mdboxes using, compared to to quota usage? I.e. how much space is wasted on deleted messages?
(not sure this will be easy to measure, because of compression..)
-jf
On 25.08.2013, at 15:37, Jan-Frode Myklebust <janfrode@tanso.net> wrote:
On Sat, Aug 24, 2013 at 10:47:56AM +0200, Michael Grimm wrote:
Sorry, but I do not understand your question.
I.e. how much space is wasted on deleted messages?
(not sure this will be easy to measure, because of compression..)
True, it is hard to answer ;-)
As a very rough estimate I do estimate a 5% waste of space regarding deleted messages. But, my handful users are very disciplined in purging their deleted messages on a regular basis (I told them to do), and thus my regular "doveadm purge -A" runs will reduce that amount of wasted disk space to a minimum.
Not very helpful, I know, Michael
On 2013-08-26 2:58 PM, Michael Grimm <trashcan@odo.in-berlin.de> wrote:
As a very rough estimate I do estimate a 5% waste of space regarding deleted messages. But, my handful users are very disciplined in purging their deleted messages on a regular basis (I told them to do), and thus my regular "doveadm purge -A" runs will reduce that amount of wasted disk space to a minimum.
Not very helpful, I know,
Are you sure about that? There was a thread a while back (I recently posted a response to it) about this, and it sounded like the mdbox files would *never* be 'compacted' (reduced in size from deleted messages)... my reply was on 8/23, thread titled "Dovecot never release preallocated space in mdbox'...
--
Best regards,
*/Charles/*
On Mon, Aug 26, 2013 at 03:31:20PM -0400, Charles Marcus wrote:
And Timo seemed to reply that hole punching was something doveadm purge could conceivably do, but doesn't do at the moment. Timo, could you please clearify a bit here?
Does non-preallocated (mdbox_preallocate_space=no) m-files get hole punched (or space re-used for new messages) after running doveadm purge? Or can we end up with a huge $mdbox_rotate_size size m-file, with only a single small message remaining after all other messages has been purged?
-jf
On 26.08.2013, at 21:23, Charles Marcus <CMarcus@Media-Brokers.com> wrote:
On 2013-08-26 2:58 PM, Michael Grimm <trashcan@odo.in-berlin.de> wrote:
I must have missed that thread, sorry.
My observations are as follows:
- if I delete mails in my mail client, mdbox files will not become reduced accordingly
- if I do run something in my client like "remove all deleted mails from my account" (purged in client), mdbox files will not become reduced accordingly
- if I do run "doveadm purge -A" on the server, mdbox files will become modified, see an example of a purge run a couple of minutes ago:
before (all my mail, ~800 mails purged in client): -rw------- 1 vmail dovecot 104856511 Aug 14 20:20 /var/mail/mike/storage/m.77 -rw------- 1 vmail dovecot 104769054 Aug 25 03:14 /var/mail/mike/storage/m.89 -rw------- 1 vmail dovecot 104848809 Aug 24 18:33 /var/mail/mike/storage/m.90 -rw------- 1 vmail dovecot 24762837 Aug 26 21:26 /var/mail/mike/storage/m.91
after (all my mail, after "doveadm purge -A"): -rw------- 1 vmail dovecot 104856511 Aug 14 20:20 /var/mail/mike/storage/m.77 -rw------- 1 vmail dovecot 104803218 Aug 26 21:26 /var/mail/mike/storage/m.92 -rw------- 1 vmail dovecot 104802874 Aug 26 21:26 /var/mail/mike/storage/m.93 -rw------- 1 vmail dovecot 21580496 Aug 26 21:26 /var/mail/mike/storage/m.94
Thus, from my point of view one needs to run "doveadm purge -A" on a regular basis *and* educate users to purge deleted mails in their clients on a regular basis as well.
(I hope I didn't misunderstand you right from the beginning.)
Regards, Michael
Another intesting thing for this thread: if you set a very high value for mdbox rotate settings, your incremental backups will be awful. If you have hundreds of messages in a mdbox and you doveadm purge one of them, the full .m file must be copied in the incremental / diferential backup.
I use 10 MB+zlib for "main storage" and 250 MB+bzip2 for alternate storage.
Regards
Javier
On 26.08.2013, at 21:59, Javier de Miguel Rodríguez <javierdemiguel-ext@us.es> wrote:
Good point! I won't suffer from that, but those with thousands of users will suffer for sure, see my example mailed before. Three mdbox files became deleted and copied to w ones.
Regards, Michael
On 2013-08-24 4:47 AM, Michael Grimm <trashcan@odo.in-berlin.de> wrote:
Don't ask me why I did chose 100m, I cannot remember;-) Ok, if one of such mdbox files will become corrupt, I will loose a lot of mail, but on the other hand I am running two dovecot servers in parallel (replicator/dsync) and I do take hourly snapshots (ZFS) of my mail storage file system as well.
Well, if they are stored on ZFS, I guess the chances of corruption are extremely minimal (much less than for other filesystems)...
I'm curious, is this on FreeBSD? Linux? I'm interested in details, as I'd love to be able to use ZFS on my gentoo linux box without having to enable modules...
--
Best regards,
*/Charles/*
On 26.08.2013, at 20:35, Charles Marcus <CMarcus@Media-Brokers.com> wrote:
On 2013-08-24 4:47 AM, Michael Grimm <trashcan@odo.in-berlin.de> wrote:
Don't ask me why I did chose 100m, I cannot remember;-) Ok, if one of such mdbox files will become corrupt, I will loose a lot of mail, but on the other hand I am running two dovecot servers in parallel (replicator/dsync) and I do take hourly snapshots (ZFS) of my mail storage file system as well.
Well, if they are stored on ZFS, I guess the chances of corruption are extremely minimal (much less than for other filesystems)...
Haven't had any file system corruption for a very long time now, even before switching to ZFS.
I'm curious, is this on FreeBSD?
Yes I migrated my servers to FreeBSD some years ago, and I am using ZFS for approx. two years now.
Linux? I'm interested in details, as I'd love to be able to use ZFS on my gentoo linux box without having to enable modules...
Sorry, but I never used ZFS with Linux. But, ZFS and snapshots as such are pretty awesome and helped me a lot in the past when it comes to "recovering from human mistakes" ;-)
Regrads, Michael
On 2013-08-26 3:05 PM, Michael Grimm <trashcan@odo.in-berlin.de> wrote:
Well, if they are stored on ZFS, I guess the chances of corruption are extremely minimal (much less than for other filesystems)...
Haven't had any file system corruption for a very long time now, even before switching to ZFS.
I know, me neither (knock on wood), which is why I put the 'extremely' in there... ;)
I'm curious, is this on FreeBSD?
Yes I migrated my servers to FreeBSD some years ago, and I am using ZFS for approx. two years now.
Linux? I'm interested in details, as I'd love to be able to use ZFS on my gentoo linux box without having to enable modules...
Sorry, but I never used ZFS with Linux. But, ZFS and snapshots as such are pretty awesome and helped me a lot in the past when it comes to "recovering from human mistakes" ;-)
Heh - that (and the resistance to hidden/silent filesystem corruption) is the main reason I'm interested in using it. :)
Andrea,
We tried to use GFS2 + Dovecot(mdbox) but when there's many mailboxes and mails, it seem to get slow while dropping mail to the mailbox. We tested with LeftHand storage by the way. So we switched to NFSv4.
Also, keep in mind that director those not detect when the backend server fails. So, we use poolmon as suggested in director wiki. We've tested and it seems to work fine.
Have a look at it.
Kouga
Il 21/08/2013 13:57, Andrea gabellini - SC ha scritto:
Hi Andrea,
I'm running a cluster with Maildir over NFS (and in past over OCFS2), with GFS2 you need to use the same options needed for NFS:
I suggest mmap_disable set on yes
Ciao
Alessio Cecchi is: @ ILS -> http://www.linux.it/~alessice/ on LinkedIn -> http://www.linkedin.com/in/alessice Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz/ @ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it
participants (9)
-
Alessio Cecchi
-
Andrea gabellini - SC
-
Charles Marcus
-
Jan-Frode Myklebust
-
Javier de Miguel Rodríguez
-
Michael Grimm
-
Robert Schetterer
-
Stan Hoeppner
-
林 宏河