[Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot

Stan Hoeppner stan at hardwarefreak.com
Sun Dec 12 10:49:06 EET 2010


Eric Rostetter put forth on 12/11/2010 9:48 AM:

> Well, it is true I know nothing about vmware/ESX.  I know in my virtual
> machine setups, I _can_ give the virtual instances access to devices which
> are not used by other virtual instances.  This is what I would do.  Yes,
> it is still virtualized, but it is dedicated, and should still perform
> pretty well -- faster than shared storage, and in the case of SSD faster
> than normal disk or iscsi.

He's running an ESX cluster, which assumes use of HA and Vmotion.  For
Vmotion to work, each node in the cluster must have direct hardware
access to every storage device.  Thus, to use an SSD, it would have to
be installed in Javier's iSCSI SAN array.  Many iSCSI arrays are
relatively inexpensive and don't offer SSD support.

However, Javier didn't ask for ways to increase his I/O throughput.  He
asked for the opposite.  I assume this is because they have a 1 GbE
based ethernet SAN, and probably only 2 or 4, GbE ports on the SAN array
controller.  With only 200 to 400MB/s bidirectional bandwidth, and many
busy guests in the eSX farm, probably many applications besides Dovecot,
Javier's organization is likely coming close to bumping the up against
the bandwidth limits of the 1 GbE links on the SAN array controller.
Thus, adding an SSD to the mix would exacerbate the I/O problem.

Thus, putting the index files in a ramdisk or using the Deovecot memory
only index file parameter are really his only two options that I can
think of that will help in the way he desires.

> He was already asking about throwing memory at the problem, and I think
> he implied he had a lot of memory. As such, the caching is there already.
> Your statement is true, but it is also a "zero config" option if he really
> does have lots of memory in the machine.

He has physical memory available, but he isn't currently assigning it to
the Dovecot guest.  To do so would require changing the memory setting
in ESX for this guest, then rebooting the guest (unless both ESX and his
OS support hot plug memory--I don't know if ESX does).  This is what
Javier was referring to when stating "adding memory".

> And in ext3, the flush rate.  Good point, that I forgot about.  It is set
> to a very small value by default (2-3 seconds maybe), and can be increased
> without too much danger (to say 10-30 seconds).

Just to be clear and accurate here, and it's probably a little OT to the
thread, XFS delaylog isn't designed to decrease filesystem log I/O
activity.  It was designed to dramatically increase the rate of write
operations to the journal log--metadata operations--and the I/O
efficiency for metadata ops.

The major visible benefit of this is a massive increase in delete
performance for many tens of thousands (or more) of files.  It decreases
journal log file fragmentation as more writes can be packed into each
inode due to in memory organization before the physical write.  This
packing thus decreases physical disk I/O as fewer, larger blocks are
written per I/O.  XFS with delaylog is an excellent match for maildir
storage.  It won't help much at all with mbox, very slightly more with
mdbox.

XFS delaylog is a _perfect_ match for the POP3 workload.  Each time a
user pulls, then deletes all messages, delaylog will optimize and then
burst the metadata journal write operations to disk, again, with far
fewer physical I/Os due to the inode optimization.

XFS with delaylog is now much faster than any version of ReiserFS, whose
claim to fame was lighting fast mass file deletion.  As of 2.6.36, XFS
is now the fastest filesystem, and not just on Linux, for almost any
workload.  This assuming real storage hardware that can take handle
massive parallelization of reads and writes.  EXT3 is still faster on a
single disk system.  But EXT3 is the "everyman" OS, which is optimized
more for the single disk case.  XFS was and still is designed for large
parallel servers with big fast storage.

> Assuming normal downtime stats, this would still be a huge win.  Since the
> machine rarely goes down, it would rarely need to rebuild indexes, and
> hence
> would only run poorly a very small percentage of the time.   Of course, it
> could run _very_ poorly right after a reboot for a while, but then will be
> back to normal soon enough.

I totally concur.

> One way to help mitigate this if using a RAM disk is have your shutdown
> script
> flush the RAM disk to physical disk (after stoping dovecot) and the reload
> it to RAM disk at startup (before starting dovecot).  

Excellent idea Eric.  I'd never considered this.  Truly, that's a
fantastic, creative solution, and should be relatively straightforward
to implement.

> This isn't
> possible if
> you use the dovecot index memory settings though.

Yeah, I think the ramdisk is the way to go here.  At least if/until a
better solution can be found.  I don't really see there is one, other
than his org investing in a faster SAN architecture such as 4/8Gb FC or
10 Gbit iSCSI.

The former can be had relatively inexpensively.  The latter is still
really pricy.  10 GbE switches and HBAs are very pricey, and there are
only a handful of iSCSI vendors offering 10 GbE SAN arrays.  One is
NetApp.  Their 10 GbE NICs for their filers run in the multiple thousand
dollar range per card.  And their filers are the most expensive on the
planet last I checked, much of that due to the flexibility.  A single
NetApp can support all speeds of Ethernet for iSCSI and NFS/CIFS access,
as well as 2/4/8 Gbit FC.  I think they offer Infiniband connectivity as
well.

>> If this is a POP server, then you really have no way around the disk I/O
>> issue.
> 
> I agree.  POP is very inefficient...

XFS with delaylog can cut down substantially on the metadata operations
associated with POP3 mass delete.  Without this FS and delaylog, yes,
POP3 I/O is very inefficient.

> Still some room for filesystem tuning, of course, but the above two options
> are of course the ones that will make the largest performance improvement
> IMHO.

Since Javier is looking for ways to decrease I/O load on the SAN, not
necessarily increase Dovecot performance, I think putting the index
files on a ramdisk is best thing to try first.  It may not be a silver
bullet.  If he's still got spare memory to add to this guest, doing both
would be better.   Using a ramdisk for the index files will instantly
remove all index I/O from the SAN.  More of Dovecot's IMAP I/O is to the
index files than mail files isn't it?  So by moving the index files to
ramdisk you should pretty much instantly remove half your SAN I/O load.
 This is assuming that Javier currently stores his index files on a SAN LUN.

-- 
Stan


More information about the dovecot mailing list