Awfully slow dovecot
Marc Stürmer
mail at marc-stuermer.de
Fri Dec 26 17:43:00 UTC 2014
Am 26.12.2014 um 17:21 schrieb Nick Edwards:
>> Maildir is fine as long as you don't have too much mail on your
>> storage, but there comes a point when you are getting big enough where
>> Maildir really isn't going to behave really nicely anymore, because
>> too many files and way too many seeks. Mdbox is quite different and
>> scales beyond the point of Maildir with ease.
>>
> that may only be true depending om your filesystem
>
> it may be no more supported for obvious reasons, but reiserfs will
> never be beaten for maildir
> and maildir is time tested and proven in very large environments with
> millions of users, with many GB quotas runs perfect with depuping on
> netapps too.
Wrong. ReiserFS is something you should really avoid nowadays on
servers. V3, while stable and not getting much love nor features anyway,
has some serious known quirks. There's a reason why Reiser4 was invented
back then.
ReiserFS4 will almost certainly never find its way in the kernel and
goesn't get much love either, especially since its author went into jail
for murder.
Because Reiser4 is being maintained outside the kernel you have either
to compile the kernels yourself or get community kernels. Aside that,
Reiser4 is still experimental and not stable.
Aside from some still raging and alive fanboys who didn't get the
message ReiserFS doesn't really matter anymore. It had some nice ideas,
but people moved on to other systems and are happy with those.
The thing is, that Maildir means every mail is one file and you've got
to maintain some index files to get a decent speed out of it. Robust,
also yes, because all data is in the file system itself.
This means for every modern file system, regardless if you are using
XFS, ext4 or Reiser, that you need to lookup that file in the folder and
read/write it. All modern file systems use for that some kind of binary
tree algorithm.
Some older file systems tend do slow down drastically if you put enough
mails in one folder. Just consider one mail folder with the contents of
the LKML of one year, this folder alone would be around 100.000 files or
even more.
And if your machines are getting big enough, Maildir plainly sucks and
making backups takes more and more time, because reading 100000 files
for a backup means many seeks and lots of work for the HDD, it means of
course much more protocol overhead e.g. on rsync.
For example my mailbox had a lkml archive with over > 30000 mails and I
switched to mdbox. On the pro side you get much more speed out of your
hardware, because you can do more with less I/O operations. On the con
side you cannot read mails on the file system level anymore and need to
learn the dovecot tool chain, which though IMHO is absolutely worth it,
backups need proper preparations and losing the mapping files would be a
desaster.
Now instead of maybe around 60000 files I had and taking over 800 MB my
mail storage consists of 347 files and takes 485 MB and I didn't throw
any mail away.
How comes? Simple, I enabled compressed saving with gzip. CPU power is
cheap, getting more memory and hdd speed is most time not. Because of
all my mails are now compressed in much less files on the HDD, looking
them up in that folder is blazingly fast, because it's quite a small
tree needed to maintain those.
And it also means that I can use the same amount of memory to cache more
mails in the file system cache (roughly around 30-40% more), because the
data itself is compressed. That's the beauty of it, and much faster
backups again.
If you really run millions of accounts, you wouldn't want Maildir
anymore when you can have mdbox.
If you want to build a really large imap server on Linux, either take
ext4 or XFS as file system.
More information about the dovecot
mailing list