[Dovecot] Migration questions...
Hi All,
We are soon to migrate our mail server from one piece of hardware to another and we would like to take this opportunity to optimize things. As a result, we would like to replace "uw-imapd" and "qpopper" with "dovecot". The version we will be installing is 1.1.13-2, as this is what is available through the latest Debian stable backports. We will also be using exim to deliver mail (through dovecot's deliver mechanism, of course).
So... We are currently using the mbox format with uw-imapd, and would like to migrate to the fastest solution possible with dovecot on the new hardware. My understanding is that "multi-dbox" is not an option in this version anyway, maildir is OK, but not great, and "single-dbox" is therefore going to be the highest performing solution. Is single-dbox the fastest way to go, considering we're going to be using email in the following ways:
- IMAP connections with all email in the Inbox (Gmail-style).
- IMAP connections with email split into many IMAP folders.
- POP3 connections with no email left on the server.
- POP3 connections with *all* email left on the server.
All connections check for new mail every 5 minutes (on average) and there are 50-60 users). Also, we are not able to change user behaviour in this instance, unfortunately.
Can anyone see any problems with the above proposal? Hopefully not...
One problem that may arise is the fact that when we migrate, all msg UIDs will be lost. If i'm not mistaken, this means that all emails will be treated by the mail client as brand new, and if through IMAP, will all go bold, and if through POP3, will all be downloaded again (if still on the server) and therefore duplicated in the mail client. If this is the case, is there anything we can do to stop this happening? Does the "Convert" plugin does this job well?
Finally, I have a rough draft of our migration plan - is there anything horribly wrong in it that's going to cause lots of problems, that anyone can spot by any chance?
Install Debian with exim, mysqld (for Horde/IMP) and mailman.
Run an online update.
Rsync homedirs and inboxes onto new server, ready for initial exim configuration.
Configure exim as per existing mail server and test that mailing lists and normal email works. You should now have the existing mail delivery solution on the brand new hardware.
Once mail delivery is sorted, add "deb http://www.backports.org/debian lenny-backports main contrib non-free" into "/etc/apt/sources.list" and run "aptitude update && aptitude install debian-backports-keyring && aptitude update".
Install dovecot (at the time of writing, this was version 1.1.13-2) and configure to use existing mbox files (inboxes in /var/spool/mail/ and IMAP folders in /home/user/mail/)
Setup exim to use dovecot's "deliver" mechanism for interacting with the inboxes (which are still in mbox format).
Configure the "convert" plugin to begin converting the mail to dbox format.
Run something manually (if possible) to convert mailboxes before people connect, so the task is already done by the time the outage is over.
Give staff access to new speedy mail server!
Thanks in advance, people - any help is greatly appreciated! :-)
Richard.
-- Richard Hobbs (IT Specialist) Toshiba Research Europe Ltd. - Cambridge Research Laboratory Email: richard.hobbs@crl.toshiba.co.uk Web: http://www.toshiba-europe.com/research/ Tel: +44 1223 436999 Mobile: +44 7811 803377
Hello,
Apologies for replying to my own email so soon, but I've had other thoughts as well...
Our server is going to have 4 hard disks. These can be configured into 2 RAID 1 (mirror) arrays, a single RAID 5 array, or a single RAID 0+1 array.
Previously, we thought that two RAID 1 arrays would be best because the inboxes can sit on one set of disks, and the IMAP folders on another. This is so both types of user don't annoy each other. However, with no knowledge of how the highest performing mailbox format works with dovecot, perhaps this isn't the best option.
Basically, i suppose i'm asking, with the highest performing mailbox option, is dovecot going to run faster with 2 individual arrays each made from 2 disks, or a single 4-disk array (in which case we'd go RAID 0+1)?
Also... would it be useful to turn off "atime" when we mount the volume(s) or does dovecot rely on this?
Thanks again, people... :-)
Richard.
Richard Hobbs wrote:
Hi All,
We are soon to migrate our mail server from one piece of hardware to another and we would like to take this opportunity to optimize things. As a result, we would like to replace "uw-imapd" and "qpopper" with "dovecot". The version we will be installing is 1.1.13-2, as this is what is available through the latest Debian stable backports. We will also be using exim to deliver mail (through dovecot's deliver mechanism, of course).
So... We are currently using the mbox format with uw-imapd, and would like to migrate to the fastest solution possible with dovecot on the new hardware. My understanding is that "multi-dbox" is not an option in this version anyway, maildir is OK, but not great, and "single-dbox" is therefore going to be the highest performing solution. Is single-dbox the fastest way to go, considering we're going to be using email in the following ways:
- IMAP connections with all email in the Inbox (Gmail-style).
- IMAP connections with email split into many IMAP folders.
- POP3 connections with no email left on the server.
- POP3 connections with *all* email left on the server.
All connections check for new mail every 5 minutes (on average) and there are 50-60 users). Also, we are not able to change user behaviour in this instance, unfortunately.
Can anyone see any problems with the above proposal? Hopefully not...
One problem that may arise is the fact that when we migrate, all msg UIDs will be lost. If i'm not mistaken, this means that all emails will be treated by the mail client as brand new, and if through IMAP, will all go bold, and if through POP3, will all be downloaded again (if still on the server) and therefore duplicated in the mail client. If this is the case, is there anything we can do to stop this happening? Does the "Convert" plugin does this job well?
Finally, I have a rough draft of our migration plan - is there anything horribly wrong in it that's going to cause lots of problems, that anyone can spot by any chance?
Install Debian with exim, mysqld (for Horde/IMP) and mailman.
Run an online update.
Rsync homedirs and inboxes onto new server, ready for initial exim configuration.
Configure exim as per existing mail server and test that mailing lists and normal email works. You should now have the existing mail delivery solution on the brand new hardware.
Once mail delivery is sorted, add "deb http://www.backports.org/debian lenny-backports main contrib non-free" into "/etc/apt/sources.list" and run "aptitude update && aptitude install debian-backports-keyring && aptitude update".
Install dovecot (at the time of writing, this was version 1.1.13-2) and configure to use existing mbox files (inboxes in /var/spool/mail/ and IMAP folders in /home/user/mail/)
Setup exim to use dovecot's "deliver" mechanism for interacting with the inboxes (which are still in mbox format).
Configure the "convert" plugin to begin converting the mail to dbox format.
Run something manually (if possible) to convert mailboxes before people connect, so the task is already done by the time the outage is over.
Give staff access to new speedy mail server!
Thanks in advance, people - any help is greatly appreciated! :-)
Richard.
-- Richard Hobbs (IT Specialist) Toshiba Research Europe Ltd. - Cambridge Research Laboratory Email: richard.hobbs@crl.toshiba.co.uk Web: http://www.toshiba-europe.com/research/ Tel: +44 1223 436999 Mobile: +44 7811 803377
On Mon, 2009-05-11 at 15:05 +0000, Richard Hobbs wrote:
Hi All,
We are soon to migrate our mail server from one piece of hardware to another and we would like to take this opportunity to optimize things. As a result, we would like to replace "uw-imapd" and "qpopper" with "dovecot". The version we will be installing is 1.1.13-2, as this is what is available through the latest Debian stable backports. We will also be using exim to deliver mail (through dovecot's deliver mechanism, of course).
So... We are currently using the mbox format with uw-imapd, and would like to migrate to the fastest solution possible with dovecot on the new hardware. My understanding is that "multi-dbox" is not an option in this version anyway, maildir is OK, but not great, and "single-dbox" is therefore going to be the highest performing solution. Is single-dbox the fastest way to go, considering we're going to be using email in the following ways:
Single-dbox is the highest performing, but note that it's not as much tested as mbox and Maildir code. I think it should work ok, but I'm not aware of any larger installations using dbox currently. So in case you find a problem, you might have to upgrade/patch Dovecot to get it fixed and that would require compiling from sources.
One problem that may arise is the fact that when we migrate, all msg UIDs will be lost. If i'm not mistaken, this means that all emails will be treated by the mail client as brand new, and if through IMAP, will all go bold, and if through POP3, will all be downloaded again (if still on the server) and therefore duplicated in the mail client. If this is the case, is there anything we can do to stop this happening? Does the "Convert" plugin does this job well?
mbox -> Maildir conversion can preserve both IMAP and POP3 UIDLs using an external script. Maildir -> dbox conversion can also preserve both, but that causes Dovecot to use this "hybrid Maildir-dbox format", which is slower than the full native dbox.
Configure the "convert" plugin to begin converting the mail to dbox format.
Run something manually (if possible) to convert mailboxes before people connect, so the task is already done by the time the outage is over.
There's convert-tool that you could use. I don't know if Debian packages it.
Basically, i suppose i'm asking, with the highest performing mailbox option, is dovecot going to run faster with 2 individual arrays each made from 2 disks, or a single 4-disk array (in which case we'd go RAID 0+1)?
My guess is that two RAID-1s would be faster, but I haven't really done any benchmarking. Anyway index files are 10-30% of the mailbox size, so the index-disks would be using a lot less disk space.
Also... would it be useful to turn off "atime" when we mount the volume(s) or does dovecot rely on this?
Dovecot doesn't rely on atime updates, so turn them off.
Hello,
Timo Sirainen wrote:
On Mon, 2009-05-11 at 15:05 +0000, Richard Hobbs wrote:
Hi All,
We are soon to migrate our mail server from one piece of hardware to another and we would like to take this opportunity to optimize things. As a result, we would like to replace "uw-imapd" and "qpopper" with "dovecot". The version we will be installing is 1.1.13-2, as this is what is available through the latest Debian stable backports. We will also be using exim to deliver mail (through dovecot's deliver mechanism, of course).
So... We are currently using the mbox format with uw-imapd, and would like to migrate to the fastest solution possible with dovecot on the new hardware. My understanding is that "multi-dbox" is not an option in this version anyway, maildir is OK, but not great, and "single-dbox" is therefore going to be the highest performing solution. Is single-dbox the fastest way to go, considering we're going to be using email in the following ways:
Single-dbox is the highest performing, but note that it's not as much tested as mbox and Maildir code. I think it should work ok, but I'm not aware of any larger installations using dbox currently. So in case you find a problem, you might have to upgrade/patch Dovecot to get it fixed and that would require compiling from sources.
In that case (and with a little further investigation which i've just done) we've decided to go with maildir! That is still going to be significantly better performing than mbox, right?
Also... do you know how uw-imapd & maildir compares to dovecot & maildir in terms of performance? Does dovecot still use indices with maildir?
One problem that may arise is the fact that when we migrate, all msg UIDs will be lost. If i'm not mistaken, this means that all emails will be treated by the mail client as brand new, and if through IMAP, will all go bold, and if through POP3, will all be downloaded again (if still on the server) and therefore duplicated in the mail client. If this is the case, is there anything we can do to stop this happening? Does the "Convert" plugin does this job well?
mbox -> Maildir conversion can preserve both IMAP and POP3 UIDLs using an external script. Maildir -> dbox conversion can also preserve both, but that causes Dovecot to use this "hybrid Maildir-dbox format", which is slower than the full native dbox.
That'd good to know. Do you happen to know where I can get a copy of this "external script" you speak of? Will it simply be included in the debian package (probably)?
Also, given that i'm going to have to test this, i will obviously be running the conversion on a copy of the live data, and then i'll have to run the conversion again during the migration outage - will i need to delete all the data and basically start again, or is it incremental?
Configure the "convert" plugin to begin converting the mail to dbox format.
Run something manually (if possible) to convert mailboxes before people connect, so the task is already done by the time the outage is over.
There's convert-tool that you could use. I don't know if Debian packages it.
OK, fair enough... if this is a different script to the one you mention above, could you pls let me know where i can get this too?
Basically, i suppose i'm asking, with the highest performing mailbox option, is dovecot going to run faster with 2 individual arrays each made from 2 disks, or a single 4-disk array (in which case we'd go RAID 0+1)?
My guess is that two RAID-1s would be faster, but I haven't really done any benchmarking. Anyway index files are 10-30% of the mailbox size, so the index-disks would be using a lot less disk space.
I assume you are talking about dovecot with maildir here, right?
Also, what would we put on each array? Are the inboxes still stored separately to the IMAP folders when using dovecot and maildir?
Would it be best to put all data on one array, and the indices on the other? We're basically after the fastest way to distribute the data! :-)
Also... would it be useful to turn off "atime" when we mount the volume(s) or does dovecot rely on this?
Dovecot doesn't rely on atime updates, so turn them off.
Will do!
Thanks again - your help is greatly appreciated :-)
Richard.
-- Richard Hobbs (IT Specialist) Toshiba Research Europe Ltd. - Cambridge Research Laboratory Email: richard.hobbs@crl.toshiba.co.uk Web: http://www.toshiba-europe.com/research/ Tel: +44 1223 436999 Mobile: +44 7811 803377
Sorry people - i'm replying to my own email again... my reply is below!
Richard Hobbs wrote:
Hello,
Timo Sirainen wrote:
On Mon, 2009-05-11 at 15:05 +0000, Richard Hobbs wrote:
Hi All,
We are soon to migrate our mail server from one piece of hardware to another and we would like to take this opportunity to optimize things. As a result, we would like to replace "uw-imapd" and "qpopper" with "dovecot". The version we will be installing is 1.1.13-2, as this is what is available through the latest Debian stable backports. We will also be using exim to deliver mail (through dovecot's deliver mechanism, of course).
So... We are currently using the mbox format with uw-imapd, and would like to migrate to the fastest solution possible with dovecot on the new hardware. My understanding is that "multi-dbox" is not an option in this version anyway, maildir is OK, but not great, and "single-dbox" is therefore going to be the highest performing solution. Is single-dbox the fastest way to go, considering we're going to be using email in the following ways: Single-dbox is the highest performing, but note that it's not as much tested as mbox and Maildir code. I think it should work ok, but I'm not aware of any larger installations using dbox currently. So in case you find a problem, you might have to upgrade/patch Dovecot to get it fixed and that would require compiling from sources.
In that case (and with a little further investigation which i've just done) we've decided to go with maildir! That is still going to be significantly better performing than mbox, right?
Also... do you know how uw-imapd & maildir compares to dovecot & maildir in terms of performance? Does dovecot still use indices with maildir?
One problem that may arise is the fact that when we migrate, all msg UIDs will be lost. If i'm not mistaken, this means that all emails will be treated by the mail client as brand new, and if through IMAP, will all go bold, and if through POP3, will all be downloaded again (if still on the server) and therefore duplicated in the mail client. If this is the case, is there anything we can do to stop this happening? Does the "Convert" plugin does this job well? mbox -> Maildir conversion can preserve both IMAP and POP3 UIDLs using an external script. Maildir -> dbox conversion can also preserve both, but that causes Dovecot to use this "hybrid Maildir-dbox format", which is slower than the full native dbox.
That'd good to know. Do you happen to know where I can get a copy of this "external script" you speak of? Will it simply be included in the debian package (probably)?
Also, given that i'm going to have to test this, i will obviously be running the conversion on a copy of the live data, and then i'll have to run the conversion again during the migration outage - will i need to delete all the data and basically start again, or is it incremental?
Configure the "convert" plugin to begin converting the mail to dbox format.
Run something manually (if possible) to convert mailboxes before people connect, so the task is already done by the time the outage is over. There's convert-tool that you could use. I don't know if Debian packages it.
OK, fair enough... if this is a different script to the one you mention above, could you pls let me know where i can get this too?
Basically, i suppose i'm asking, with the highest performing mailbox option, is dovecot going to run faster with 2 individual arrays each made from 2 disks, or a single 4-disk array (in which case we'd go RAID 0+1)? My guess is that two RAID-1s would be faster, but I haven't really done any benchmarking. Anyway index files are 10-30% of the mailbox size, so the index-disks would be using a lot less disk space.
I assume you are talking about dovecot with maildir here, right?
Also, what would we put on each array? Are the inboxes still stored separately to the IMAP folders when using dovecot and maildir?
Would it be best to put all data on one array, and the indices on the other? We're basically after the fastest way to distribute the data! :-)
My colleague has mentioned something of interest... can dovecot keep the index files in RAM? If so, the performance will obviously be *so* much better than running them off the hard disks.
This also raises questions about what happens if the machine is powered off etc... but it's UPSd etc... so if it were to rebuild it's indexes every time it was booted up, that wouldn't be the end of the world.
Anyway... thoughts on this topic are also appreciated :-)
Thanks again!
Also... would it be useful to turn off "atime" when we mount the volume(s) or does dovecot rely on this? Dovecot doesn't rely on atime updates, so turn them off.
Will do!
Thanks again - your help is greatly appreciated :-)
Richard.
-- Richard Hobbs (IT Specialist) Toshiba Research Europe Ltd. - Cambridge Research Laboratory Email: richard.hobbs@crl.toshiba.co.uk Web: http://www.toshiba-europe.com/research/ Tel: +44 1223 436999 Mobile: +44 7811 803377
On May 12, 2009, at 6:41 AM, Richard Hobbs wrote:
Single-dbox is the highest performing, but note that it's not as
much tested as mbox and Maildir code. I think it should work ok, but
I'm not aware of any larger installations using dbox currently. So in case
you find a problem, you might have to upgrade/patch Dovecot to get it
fixed and that would require compiling from sources.In that case (and with a little further investigation which i've just done) we've decided to go with maildir! That is still going to be significantly better performing than mbox, right?
Depends on the usage, but it's significantly better performing than UW- IMAP. Dovecot+mbox is also significantly faster than UW-IMAP+mbox.
Also... do you know how uw-imapd & maildir compares to dovecot &
maildir in terms of performance?
Maildir is a patch on top of the official UW-IMAP distribution. I
don't know how well it performs, but it doesn't use any indexes and
indexes are what makes Dovecot fast.
Does dovecot still use indices with maildir?
Yes.
mbox -> Maildir conversion can preserve both IMAP and POP3 UIDLs
using an external script. Maildir -> dbox conversion can also preserve
both, but that causes Dovecot to use this "hybrid Maildir-dbox format",
which is slower than the full native dbox.That'd good to know. Do you happen to know where I can get a copy of this "external script" you speak of? Will it simply be included in
the debian package (probably)?
http://wiki.dovecot.org/Migration/MailFormat -> mb2md.py
Also, given that i'm going to have to test this, i will obviously be running the conversion on a copy of the live data, and then i'll
have to run the conversion again during the migration outage - will i need to delete all the data and basically start again, or is it incremental?
I don't think it can do incremental, but I've never looked at the
script myself.
My guess is that two RAID-1s would be faster, but I haven't really
done any benchmarking. Anyway index files are 10-30% of the mailbox
size, so the index-disks would be using a lot less disk space.I assume you are talking about dovecot with maildir here, right?
The same applies to all mailbox formats Dovecot supports.
Also, what would we put on each array? Are the inboxes still stored separately to the IMAP folders when using dovecot and maildir?
Inboxes are stored inside Maildir like all other mailboxes.
Would it be best to put all data on one array, and the indices on the other? We're basically after the fastest way to distribute the
data! :-)
Last I heard it was faster to keep index files in a separate disk than
mailbox data. I've never verified this myself, but it sounds reasonable.
My colleague has mentioned something of interest... can dovecot keep
the index files in RAM? If so, the performance will obviously be *so* much better than running them off the hard disks.
Last I heard it didn't really help much. Assuming your OS works
properly it already keeps the necessary indexes in memory anyway. Also
I wrote a patch that tries to tell OS to do that by dropping message
files' data from cache after reading the messages:
http://dovecot.org/patches/1.1/fadvise.diff
But no one has told me if that helps or makes things worse..
This also raises questions about what happens if the machine is
powered off etc... but it's UPSd etc... so if it were to rebuild it's indexes every time it was booted up, that wouldn't be the end of the world.
Well, there are two parts of index rebuilding: dovecot.index files
which are quick to rebuild and dovecot.index.cache files that contain
the useful fields that clients want. The cache file is especially
useful with webmails and if it's gone it could mean opening user's all
messages and reading their headers and perhaps even bodies.
So depends on what clients your users use, but in some cases it could
be 10-100x slower to open the mailbox if the cache file is gone.
Hi All,
Replies to everyone below (to keep the number of emails down)...
Ed W wrote:
Richard Hobbs wrote:
Hi All,
We are soon to migrate our mail server from one piece of hardware to another and we would like to take this opportunity to optimize things.
Can I recommend you add virtualisation to your todo list. I use linux-vserver, but there are plenty other ideas out there.
It's just superb that later on you can migrate services between physical hardware with MUCH less hassle. You can easily test upgrades in a sandbox first. etc
I personally split all the tasks into different virtual servers. Right now I actually still have quite a few mail related services in a single vserver, but ideally you would split everything up and then later if you needed to upgrade a single service or move it to a new machine it would have minimal effect.
Your general process sounds about right though - I think it may be possible for you to preserve pop uidls, but see the wiki for more notes on that
I appreciate that the world is going crazy for virtualization at the moment and I agree that it is a truly great concept. However, in this instance, where we want absolute ultimate performance and we know for a fact that this machine will never be used for anything else, we are shying away from virtualization. In some way, it's another layer to potentially go wrong, and another layer to potentially slow things down.
I appreciate the advice though, it's just that i'm not sure it's for us in this particular instance.
Regarding keeping msg UIDs, i know that can happen now, which is great :-)
Lou Duchez wrote:
I don't know whether this would help with the migration, but I routinely solve a similar problem. I have implemented mail failover between two servers -- which are configured with identical sets of mailboxes -- and every 10 minutes or so, a script grabs any E-Mails from the other server and stores them locally in the proper mailboxes. This script relies on IMAP connectivity and passwords in plaintext in a passwd-file. Here it is: <snip>
Blast, forgot something: the "Simple.pm" referenced in the script is this thing:
http://search.cpan.org/~jpaf/Net-IMAP-Simple-0.93/Simple.pm
Download it, compile it, put it somewhere that the script can find it.
Thank you very much for that script - i will certainly take a good look through that when the time comes. Does it rely on IMAP connectivity at both ends? i.e. does it read from old IMAP and write to new IMAP, or does it deal with the filesystem at one end or the other?
Timo Sirainen wrote:
On May 12, 2009, at 6:41 AM, Richard Hobbs wrote:
Single-dbox is the highest performing, but note that it's not as much tested as mbox and Maildir code. I think it should work ok, but I'm not aware of any larger installations using dbox currently. So in case you find a problem, you might have to upgrade/patch Dovecot to get it fixed and that would require compiling from sources.
In that case (and with a little further investigation which i've just done) we've decided to go with maildir! That is still going to be significantly better performing than mbox, right?
Depends on the usage, but it's significantly better performing than UW-IMAP. Dovecot+mbox is also significantly faster than UW-IMAP+mbox.
OK... so Dovecot is certainly significantly faster that uw-imapd in both cases, but is dovecot fastest with mbox or maildir? I would assume maildir, but you never know...
Also... do you know how uw-imapd & maildir compares to dovecot & maildir in terms of performance?
Maildir is a patch on top of the official UW-IMAP distribution. I don't know how well it performs, but it doesn't use any indexes and indexes are what makes Dovecot fast.
Does dovecot still use indices with maildir?
Yes.
Excellent :-)
mbox -> Maildir conversion can preserve both IMAP and POP3 UIDLs using an external script. Maildir -> dbox conversion can also preserve both, but that causes Dovecot to use this "hybrid Maildir-dbox format", which is slower than the full native dbox.
That'd good to know. Do you happen to know where I can get a copy of this "external script" you speak of? Will it simply be included in the debian package (probably)?
http://wiki.dovecot.org/Migration/MailFormat -> mb2md.py
how does this differ from the "convert-tool" script packaged with dovecot itself?
Also, given that i'm going to have to test this, i will obviously be running the conversion on a copy of the live data, and then i'll have to run the conversion again during the migration outage - will i need to delete all the data and basically start again, or is it incremental?
I don't think it can do incremental, but I've never looked at the script myself.
My guess is that two RAID-1s would be faster, but I haven't really done any benchmarking. Anyway index files are 10-30% of the mailbox size, so the index-disks would be using a lot less disk space.
I assume you are talking about dovecot with maildir here, right?
The same applies to all mailbox formats Dovecot supports.
Also, what would we put on each array? Are the inboxes still stored separately to the IMAP folders when using dovecot and maildir?
Inboxes are stored inside Maildir like all other mailboxes.
Would it be best to put all data on one array, and the indices on the other? We're basically after the fastest way to distribute the data! :-)
Last I heard it was faster to keep index files in a separate disk than mailbox data. I've never verified this myself, but it sounds reasonable.
Same here - i assume dovecot gives you a choice as to where it puts its index files, right?
Also... where are the maildir data files kept? homedirs or elsewhere?
My colleague has mentioned something of interest... can dovecot keep the index files in RAM? If so, the performance will obviously be *so* much better than running them off the hard disks.
Last I heard it didn't really help much. Assuming your OS works properly it already keeps the necessary indexes in memory anyway. Also I wrote a patch that tries to tell OS to do that by dropping message files' data from cache after reading the messages:
http://dovecot.org/patches/1.1/fadvise.diff
But no one has told me if that helps or makes things worse..
Fair enough... i think we'll let the OS do it's thing, from what i've heard in general.
This also raises questions about what happens if the machine is powered off etc... but it's UPSd etc... so if it were to rebuild it's indexes every time it was booted up, that wouldn't be the end of the world.
Well, there are two parts of index rebuilding: dovecot.index files which are quick to rebuild and dovecot.index.cache files that contain the useful fields that clients want. The cache file is especially useful with webmails and if it's gone it could mean opening user's all messages and reading their headers and perhaps even bodies.
So depends on what clients your users use, but in some cases it could be 10-100x slower to open the mailbox if the cache file is gone.
OK, this isn't too much of an issue now that we know we're just going to let the OS cache whatever it wants to in RAM.
However, does dovecot allow configuration of where to store both the index files *and* cache files?
Given that we have 2 RAID arrays, should we put index files *and* cache files on one array, and the data on another, or should put either the index files *or* cache files on the same array as the data?
Thanks again, people! :-)
Richard.
-- Richard Hobbs (IT Specialist) Toshiba Research Europe Ltd. - Cambridge Research Laboratory Email: richard.hobbs@crl.toshiba.co.uk Web: http://www.toshiba-europe.com/research/ Tel: +44 1223 436999 Mobile: +44 7811 803377
On May 13, 2009, at 9:57 AM, Richard Hobbs wrote:
Depends on the usage, but it's significantly better performing than UW-IMAP. Dovecot+mbox is also significantly faster than UW-IMAP+mbox.
OK... so Dovecot is certainly significantly faster that uw-imapd in
both cases, but is dovecot fastest with mbox or maildir? I would assume maildir, but you never know...
It's not that simple to answer. With mbox it's probably faster to read
through all mails, because they're in a single file. With Maildir it's
faster to delete mails, because it only needs to delete a single file,
instead of moving data around in the mbox file. But Maildir has less
problems and it's much less likely to get corrupted, so even if mbox
performance would be better in some cases I'd recommend Maildir.
http://wiki.dovecot.org/Migration/MailFormat -> mb2md.py
how does this differ from the "convert-tool" script packaged with dovecot itself?
convert-tool doesn't preserve UIDs, mb2md.py does.
Would it be best to put all data on one array, and the indices on
the other? We're basically after the fastest way to distribute the
data! :-)Last I heard it was faster to keep index files in a separate disk
than mailbox data. I've never verified this myself, but it sounds
reasonable.Same here - i assume dovecot gives you a choice as to where it puts
its index files, right?Also... where are the maildir data files kept? homedirs or elsewhere?
mail_location setting specifies where both of them are kept. For
example:
mail_location = maildir:~/Maildir:INDEX=/var/index/%u
However, does dovecot allow configuration of where to store both the index files *and* cache files?
Whenever something is talking about "index files", it also means cache
files.
Given that we have 2 RAID arrays, should we put index files *and*
cache files on one array, and the data on another, or should put either the index files *or* cache files on the same array as the data?
There's no choice, it's both or neither.
BTW. I don't these optimizations will make much of a noticeable
difference for you. A single server with recent hardware should be
able to easily handle hundreds of simultaneous users.
Timo Sirainen wrote:
On May 13, 2009, at 9:57 AM, Richard Hobbs wrote:
Depends on the usage, but it's significantly better performing than UW-IMAP. Dovecot+mbox is also significantly faster than UW-IMAP+mbox.
OK... so Dovecot is certainly significantly faster that uw-imapd in both cases, but is dovecot fastest with mbox or maildir? I would assume maildir, but you never know...
It's not that simple to answer. With mbox it's probably faster to read through all mails, because they're in a single file. With Maildir it's faster to delete mails, because it only needs to delete a single file, instead of moving data around in the mbox file. But Maildir has less problems and it's much less likely to get corrupted, so even if mbox performance would be better in some cases I'd recommend Maildir.
OK... so in both cases, the files are indexed and headers etc... cached, so in both scenarios:
maildir - "slow" to read mails, but all indexed, so slowness kinda disappears.
mbox - "slow" to delete mails - indexing will help this problem, but the filesystem will still have work to do in order to join the two halves of the file.
My take on the above is that overall, maildir will probably be faster, and if it's less likely to corrupt and has fewer problems, i think that's the format for us!
http://wiki.dovecot.org/Migration/MailFormat -> mb2md.py
how does this differ from the "convert-tool" script packaged with dovecot itself?
convert-tool doesn't preserve UIDs, mb2md.py does.
Ah... that's a fairly critical piece of info, so thank you for that! :-)
Would it be best to put all data on one array, and the indices on the other? We're basically after the fastest way to distribute the data! :-)
Last I heard it was faster to keep index files in a separate disk than mailbox data. I've never verified this myself, but it sounds reasonable.
Same here - i assume dovecot gives you a choice as to where it puts its index files, right?
Also... where are the maildir data files kept? homedirs or elsewhere?
mail_location setting specifies where both of them are kept. For example:
mail_location = maildir:~/Maildir:INDEX=/var/index/%u
Excellent - that makes life very simple!
However, does dovecot allow configuration of where to store both the index files *and* cache files?
Whenever something is talking about "index files", it also means cache files.
Given that we have 2 RAID arrays, should we put index files *and* cache files on one array, and the data on another, or should put either the index files *or* cache files on the same array as the data?
There's no choice, it's both or neither.
BTW. I don't these optimizations will make much of a noticeable difference for you. A single server with recent hardware should be able to easily handle hundreds of simultaneous users.
That's also good to know... i like to do a job right instead of relying on faster hardware, as i'm sure you all do too, but it's good to know that if i make one or two "non-optimal" choices along the way, it'll probably be lightning fast anyway!
The main complaint we have from users is that their IMAP Inbox, with 5000 emails in it takes ages to appear, and no amount of coaxing will convince them to split their inbox into multiple folders.
With maildir, and especially dovecot, this problem effectively disappears! That's the theory anyway, right?
Thanks again!
Richard.
-- Richard Hobbs (IT Specialist) Toshiba Research Europe Ltd. - Cambridge Research Laboratory Email: richard.hobbs@crl.toshiba.co.uk Web: http://www.toshiba-europe.com/research/ Tel: +44 1223 436999 Mobile: +44 7811 803377
on 5-13-2009 8:55 AM Richard Hobbs spake the following:
Timo Sirainen wrote:
On May 13, 2009, at 9:57 AM, Richard Hobbs wrote:
Depends on the usage, but it's significantly better performing than UW-IMAP. Dovecot+mbox is also significantly faster than UW-IMAP+mbox. OK... so Dovecot is certainly significantly faster that uw-imapd in both cases, but is dovecot fastest with mbox or maildir? I would assume maildir, but you never know... It's not that simple to answer. With mbox it's probably faster to read through all mails, because they're in a single file. With Maildir it's faster to delete mails, because it only needs to delete a single file, instead of moving data around in the mbox file. But Maildir has less problems and it's much less likely to get corrupted, so even if mbox performance would be better in some cases I'd recommend Maildir.
OK... so in both cases, the files are indexed and headers etc... cached, so in both scenarios:
maildir - "slow" to read mails, but all indexed, so slowness kinda disappears.
mbox - "slow" to delete mails - indexing will help this problem, but the filesystem will still have work to do in order to join the two halves of the file. Actually, I think a new file is written with everything re-written except the deleted message and then linked over the old file or renamed to the old file. That is why many clients will just mark them deleted and then you run a separate purge step, or the client is set to purge on exit. That is more efficient because the big write step is only done once.
On Wed, 2009-05-13 at 09:56 -0700, Scott Silva wrote:
mbox - "slow" to delete mails - indexing will help this problem, but the filesystem will still have work to do in order to join the two halves of the file. Actually, I think a new file is written with everything re-written except the deleted message and then linked over the old file or renamed to the old file. That is why many clients will just mark them deleted and then you run a separate purge step, or the client is set to purge on exit. That is more efficient because the big write step is only done once.
It's neither actually. It's about moving data inside the mbox file to get rid of the expunged messages and then truncating the mbox file. So this means that if you have a 1 GB mbox file and you delete the first message Dovecot needs to write 1 GB of data, but if you delete near the end of file it's going to be fast.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, 13 May 2009, Richard Hobbs wrote:
difference for you. A single server with recent hardware should be able to easily handle hundreds of simultaneous users.
That's also good to know... i like to do a job right instead of relying on faster hardware, as i'm sure you all do too, but it's good to know that if i make one or two "non-optimal" choices along the way, it'll probably be lightning fast anyway!
Well, IMHO there is some advantage to split index data and mail data on separate disks, if they are located at different channels, so the load per channel drops. But this is probably not so significant, if your OS uses one CPU for all I/O anyway....
The main complaint we have from users is that their IMAP Inbox, with 5000 emails in it takes ages to appear, and no amount of coaxing will convince them to split their inbox into multiple folders.
Oh, we serve Maildir via Dovecot IMAP and 5000 messages per folder are a wimp. Problems start if the user:
a) hits a 2GB limit on a mailbox in the mail client,
b) try to move all their 20'000+ messages at once to an archive folder once a year, when their quota limit is exceeded. This can make the whole server irresponsible slow, because I have the mail_log plugin running as well.
With maildir, and especially dovecot, this problem effectively disappears! That's the theory anyway, right?
Bye,
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iQEVAwUBSgvEHnWSIuGy1ktrAQLsswgAvW1yUeDUVCvivf/67C47aJF4Vahuovzu kMBcUlrB9K2kxxz4yP2F+UVSnsQAVyo5I7XztFQ81mCZdNKStFO8k4booI75zpW/ Qwm6d9jezReMMzL/D73NKNwi1hQwK+CDsSIuuIFRfFuHkH7q02BAT2SF66dTeqM3 e08g66b0Ki2qlSWfE4smAjZFrVZCPgDlYgSRYrUK0VixV6ljyZLU+KFrl97RkHOY qMvxM6WfQ7TsejcU5UrFGL/J0Wyh5TOzO6iYSAXhsWM6AEthfDvz1C5rpcBmk1V4 BYQYe4cnG7b7slb0SsMed6SVuyzVfKnoZo+Wm5vEy27DGy8JfVljuA== =wzNa -----END PGP SIGNATURE-----
On 5/14/2009, Steffen Kaiser (skdovecot@smail.inf.fh-brs.de) wrote:
b) try to move all their 20'000+ messages at once to an archive folder once a year, when their quota limit is exceeded. This can make the whole server irresponsible slow, because I have the mail_log plugin running as well.
Hmmm... it would be nice if there was some way to check for mass moves like that and just log one line like:
imap(user): copy -> Archive: uid=908, IDIOT_USER moved 20K+ messages at once - FLOG HIM!!
Don't know if its possible to detect a mass move though...
--
Best regards,
Charles
On 14/05/2009 5:11 PM, Steffen Kaiser wrote:
On Wed, 13 May 2009, Richard Hobbs wrote:
The main complaint we have from users is that their IMAP Inbox, with 5000 emails in it takes ages to appear, and no amount of coaxing will convince them to split their inbox into multiple folders.
Oh, we serve Maildir via Dovecot IMAP and 5000 messages per folder are a wimp. Problems start if the user: We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails). (Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels) a) hits a 2GB limit on a mailbox in the mail client,b) try to move all their 20'000+ messages at once to an archive folder once a year, when their quota limit is exceeded. This generally causes us problems as well..
With maildir, and especially dovecot, this problem effectively disappears! That's the theory anyway, right? Theory is wonderful when it works :-) I dont know for sure if our problems are caused by what I mentioned above, but it might be worth keeping it in mind.
-- Thanks, Phill Macey (CiSRA IT Services)
Phillip Macey wrote:
Oh, we serve Maildir via Dovecot IMAP and 5000 messages per folder are a wimp. Problems start if the user:
We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails).
(Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels)
At first glance this sounds like a large folder is being indexed... are you using Dovecot deliver (which updates indices on deliver)?
With maildir, and especially dovecot, this problem effectively disappears! That's the theory anyway, right?
Theory is wonderful when it works :-)
In theory, Theory and Practice are the same. In practice, this isn't always the case. :)
-- Curtis Maloney cmaloney@cardgate.net
Curtis Maloney wrote:
Phillip Macey wrote:
Oh, we serve Maildir via Dovecot IMAP and 5000 messages per folder are a wimp. Problems start if the user:
We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails).(Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels)
At first glance this sounds like a large folder is being indexed... are you using Dovecot deliver (which updates indices on deliver)?
This raises an interesting question for me actually... given that we've now decided dovecot and maildir is the way forward for us, which delivery method should we use in exim? exim can support maildir, (right?) and so can dovecot, so should i use dovecot's "deliver" mechanism, or exim's own internal mechanism?
Thanks again!
Richard.
-- Richard Hobbs (IT Specialist) Toshiba Research Europe Ltd. - Cambridge Research Laboratory Email: richard.hobbs@crl.toshiba.co.uk Web: http://www.toshiba-europe.com/research/ Tel: +44 1223 436999 Mobile: +44 7811 803377
Richard Hobbs wrote:
Curtis Maloney wrote:
Phillip Macey wrote:
Oh, we serve Maildir via Dovecot IMAP and 5000 messages per folder are a wimp. Problems start if the user: We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails). (Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels) At first glance this sounds like a large folder is being indexed... are you using Dovecot deliver (which updates indices on deliver)?This raises an interesting question for me actually... given that we've now decided dovecot and maildir is the way forward for us, which delivery method should we use in exim? exim can support maildir, (right?) and so can dovecot, so should i use dovecot's "deliver" mechanism, or exim's own internal mechanism?
Only dovecot 'deliver' will update the index on delivery.
~Seth
On F 15 May, 2009, at 09:39 , Seth Mattinen wrote:
This raises an interesting question for me actually... given that
we've now decided dovecot and maildir is the way forward for us, which delivery method should we use in exim? exim can support maildir, (right?) and so can dovecot, so should i use dovecot's "deliver" mechanism, or exim's own internal mechanism?Only dovecot 'deliver' will update the index on delivery.
well, using an LDA makes a little more cumbersome to check the local
recipient at RCPT time.
What repercussion has not updating the index on delivery? My systems
use exim to deliver and they seem to act normally...
Giuliano
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Fri, 15 May 2009, Giuliano Gavazzi wrote:
well, using an LDA makes a little more cumbersome to check the local recipient at RCPT time.
Huh? exim won't try local deliver unless it has decided it is a local recipient. You won't get overquota status this way, did you mean that?
Bye,
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iQEVAwUBSg1EP3WSIuGy1ktrAQJtVAgAqH3G9syzKPtmLJ1axrGtAAKb6NS5knma 8081aI9LXsOHU/MiPJ0KCzRER80gJAbNhyYlFNaHU2nc9ryOPFO3+HnJ/tkpSqkG 3P8m7PCYa1+pupknU+ZH9YarcA7qxFT/wPHKCvhnlXEqQttvImn0ROL4EUN1waqP QmFivY26FJuhrn4BFm8t/1Unl5AJ5kknajZ7Q/WuNcwhxzewfNF4+CG29VOmJzs9 V87kwc4mpUEWwnkG7YkL4RJWhnBE9dnoZBYaCMR3GyqiL2vS1qGHJwYY6WC0tObQ bym38B7+OV9cOjiU0RYj8ZB4wa9UiGvNEokCzvS1uSjhx9bKl7JyGA== =CSiD -----END PGP SIGNATURE-----
On F 15 May, 2009, at 12:30 , Steffen Kaiser wrote:
On Fri, 15 May 2009, Giuliano Gavazzi wrote:
well, using an LDA makes a little more cumbersome to check the
local recipient at RCPT time.Huh? exim won't try local deliver unless it has decided it is a
local recipient. You won't get overquota status this way, did you
mean that?
No. The problem is the "try"... or at least, it was when I had a
cyrus + exim setup five years ago.
With cyrus it was impossible to easily check if the recipient was an
existing local user. It required either the use of the mbpath utility
or a recipient callout with lmtp over tcp.
I suppose that with a virtual users setup this is a moot point, but
with system users nothing beats the simplicity of an all exim setup!
(If you use exim, that is).
Giuliano
Seth Mattinen wrote:
Richard Hobbs wrote:
Curtis Maloney wrote:
Phillip Macey wrote:
Oh, we serve Maildir via Dovecot IMAP and 5000 messages per folder are a wimp. Problems start if the user: We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails). (Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels) At first glance this sounds like a large folder is being indexed... are you using Dovecot deliver (which updates indices on deliver)? This raises an interesting question for me actually... given that we've now decided dovecot and maildir is the way forward for us, which delivery method should we use in exim? exim can support maildir, (right?) and so can dovecot, so should i use dovecot's "deliver" mechanism, or exim's own internal mechanism?Only dovecot 'deliver' will update the index on delivery.
Do does this mean that it's slightly slower to actually deliver the mail with dovecot (because it's writing two places instead of one), but it saves the files having to be indexed again, so overall potentially faster?
And one more question... given that we're going to be using maildir, should i still use dovecot's POP3 server, or whatever the standard one is? I've heard (through google mainly), that dovecot's POP3 server, in terms of performance, is actually quite bad compared to other ones, but what's your take on this?
Thanks again, people!
Richard.
-- Richard Hobbs (IT Specialist) Toshiba Research Europe Ltd. - Cambridge Research Laboratory Email: richard.hobbs@crl.toshiba.co.uk Web: http://www.toshiba-europe.com/research/ Tel: +44 1223 436999 Mobile: +44 7811 803377
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Fri, May 15, 2009 at 11:28:42AM +0000, Richard Hobbs wrote:
[...]
Only dovecot 'deliver' will update the index on delivery.
Do does this mean that it's slightly slower to actually deliver the mail with dovecot (because it's writing two places instead of one), but it saves the files having to be indexed again, so overall potentially faster?
Things get re-indexed on client request if Dovecot sees that index is stale. So you are buying faster response times for clients with somewhat higher server load at mail delivery.
I don't know how this piecemeal update of the index stacks up against a complete re-index, but I'd assume it to be more efficient (only having to do new mails instead of whole mailbox). Still, I find the re-index to be almost instantaneous on not-so-small mailboxes (hundreds of MB) and fairly modest hardware, by today's standards.
Regards
- -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFKDVeEBcgs9XrR2kYRAtBGAJ4zHYA23C5SxJRcS5khH5cskZGfSACdFs3r 7tmWdF5ShGkTOiqVXWQIYjs= =swId -----END PGP SIGNATURE-----
Seth Mattinen, 2009-05-15 09:39:
(right?) and so can dovecot, so should i use dovecot's "deliver" mechanism, or exim's own internal mechanism? Only dovecot 'deliver' will update the index on delivery.
That's obvious. But dovecot (AFAIK) updates (not rebuilds) indexes when it sees new messages, so it should not matter that much. At least with maildir and not too much messages in poll intervalls, so usual "always on" imap usage should be ok. It might be even better not to use dovecot-lda for heavy-usage systems, because you save the fork/exec and indexes are updated on demand (i.e. when the folder is polled), not for every single message inserted.
With exim, I had also the problem that messages got bounced because the lda crashed occasionally (was a svn version, so I don't complain) and stupid exim treated this as a permanent error.
Phillip Macey wrote:
On 14/05/2009 5:11 PM, Steffen Kaiser wrote:
On Wed, 13 May 2009, Richard Hobbs wrote:
The main complaint we have from users is that their IMAP Inbox, with 5000 emails in it takes ages to appear, and no amount of coaxing will convince them to split their inbox into multiple folders.
Oh, we serve Maildir via Dovecot IMAP and 5000 messages per folder are a wimp. Problems start if the user: We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails). (Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels)
Are you using ext3 by chance? Vanilla ext3 without directory indexing (or whatever it's called) *hates* directories with a lot of files - like maildir. Personally, I use XFS, which doesn't suffer from this problem since it uses b-trees instead of a table(!) like ext3 does.
~Seth
Seth Mattinen wrote:
Phillip Macey wrote:
On 14/05/2009 5:11 PM, Steffen Kaiser wrote:
On Wed, 13 May 2009, Richard Hobbs wrote:
The main complaint we have from users is that their IMAP Inbox, with 5000 emails in it takes ages to appear, and no amount of coaxing will convince them to split their inbox into multiple folders. Oh, we serve Maildir via Dovecot IMAP and 5000 messages per folder are a wimp. Problems start if the user: We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails). (Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels)Are you using ext3 by chance? Vanilla ext3 without directory indexing (or whatever it's called) *hates* directories with a lot of files - like maildir. Personally, I use XFS, which doesn't suffer from this problem since it uses b-trees instead of a table(!) like ext3 does.
This raises another question for me actually...
We will have one volume for indexes and another volume for data (using maildir). We will be using the latest stable Debian Linux distro.
Any opinions on the best filesystem to use? We would normally go ReiserFS for large volumes, and ext3 for small volumes because of the unlimited inodes in reiserfs, but i understand that support for that is beginning to disappear given that Hans Reiser got into a bit of trouble a few years ago.
Anyway... that would leave ext3, but is there a better choice i could make in this instance? We do want performance, of course, but also complete reliability and resilience when it comes to system crashes etc... we do *not* want data corruption, and ext3 we know has a very good journalling and data recovery methods. Well... they're very mature, anyway.
Any opinions?
Thanks again!
Richard.
-- Richard Hobbs (IT Specialist) Toshiba Research Europe Ltd. - Cambridge Research Laboratory Email: richard.hobbs@crl.toshiba.co.uk Web: http://www.toshiba-europe.com/research/ Tel: +44 1223 436999 Mobile: +44 7811 803377
Richard Hobbs wrote:
Seth Mattinen wrote:
Phillip Macey wrote:
On 14/05/2009 5:11 PM, Steffen Kaiser wrote:
On Wed, 13 May 2009, Richard Hobbs wrote:
The main complaint we have from users is that their IMAP Inbox, with 5000 emails in it takes ages to appear, and no amount of coaxing will convince them to split their inbox into multiple folders. Oh, we serve Maildir via Dovecot IMAP and 5000 messages per folder are a wimp. Problems start if the user: We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails). (Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels) Are you using ext3 by chance? Vanilla ext3 without directory indexing (or whatever it's called) *hates* directories with a lot of files - like maildir. Personally, I use XFS, which doesn't suffer from this problem since it uses b-trees instead of a table(!) like ext3 does.This raises another question for me actually...
We will have one volume for indexes and another volume for data (using maildir). We will be using the latest stable Debian Linux distro.
Any opinions on the best filesystem to use? We would normally go ReiserFS for large volumes, and ext3 for small volumes because of the unlimited inodes in reiserfs, but i understand that support for that is beginning to disappear given that Hans Reiser got into a bit of trouble a few years ago.
Anyway... that would leave ext3, but is there a better choice i could make in this instance? We do want performance, of course, but also complete reliability and resilience when it comes to system crashes etc... we do *not* want data corruption, and ext3 we know has a very good journalling and data recovery methods. Well... they're very mature, anyway.
I used to use ext3, ran into its horrible performance even with directory indexing enabled, went to XFS and never looked back. All of my systems are Debian stable. Reiser3 is part of the kernel so I wouldn't worry about that; Namesys considered it complete and stopped work on it long before the whole murder thing. Both Reiser3 and XFS have worse reputations than ext3, however, I've seen ext3 filesystems hosed beyond repair, too. All my XFS filesystems have battery-backed cache controllers, so it hasn't happened to me yet, hopefully never. ;) One catch with XFS (such as with LVM) to keep in mind is you can't ever shrink it, only grow.
ext3 is mature but IMHO completely unsuitable for a busy mail server or any situation where you'll have a bajillion of files in one directory. The exact point at which ext3 will screw you over obviously depends on many factors. But when it happens it'll probably be painful to reformat to something better.
~Seth
Seth Mattinen wrote:
Richard Hobbs wrote:
Seth Mattinen wrote:
Phillip Macey wrote:
On 14/05/2009 5:11 PM, Steffen Kaiser wrote:
On Wed, 13 May 2009, Richard Hobbs wrote:
The main complaint we have from users is that their IMAP Inbox, with 5000 emails in it takes ages to appear, and no amount of coaxing will convince them to split their inbox into multiple folders. Oh, we serve Maildir via Dovecot IMAP and 5000 messages per folder are a wimp. Problems start if the user: We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails). (Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels) Are you using ext3 by chance? Vanilla ext3 without directory indexing (or whatever it's called) *hates* directories with a lot of files - like maildir. Personally, I use XFS, which doesn't suffer from this problem since it uses b-trees instead of a table(!) like ext3 does. This raises another question for me actually...We will have one volume for indexes and another volume for data (using maildir). We will be using the latest stable Debian Linux distro.
Any opinions on the best filesystem to use? We would normally go ReiserFS for large volumes, and ext3 for small volumes because of the unlimited inodes in reiserfs, but i understand that support for that is beginning to disappear given that Hans Reiser got into a bit of trouble a few years ago.
Anyway... that would leave ext3, but is there a better choice i could make in this instance? We do want performance, of course, but also complete reliability and resilience when it comes to system crashes etc... we do *not* want data corruption, and ext3 we know has a very good journalling and data recovery methods. Well... they're very mature, anyway.
I used to use ext3, ran into its horrible performance even with directory indexing enabled, went to XFS and never looked back. All of my systems are Debian stable. Reiser3 is part of the kernel so I wouldn't worry about that; Namesys considered it complete and stopped work on it long before the whole murder thing. Both Reiser3 and XFS have worse reputations than ext3, however, I've seen ext3 filesystems hosed beyond repair, too. All my XFS filesystems have battery-backed cache controllers, so it hasn't happened to me yet, hopefully never. ;) One catch with XFS (such as with LVM) to keep in mind is you can't ever shrink it, only grow.
Trouble is... i've been googling this as well, just now, and loads of people say XFS has the better performance, but loads of other people say ReiserFS has the better performance.
We have battery backed up RAID controllers too, in this new system, and the systems are UPSd, so on that basis i'm still none the wiser! lol
I appreciate your experience with XFS is a positive one, but even the dovecot web site says XFS might now be a good choice...
http://wiki.dovecot.org/MailboxFormat/Maildir
What a tough decision! I know it probably won't make too much difference in my situation, but i want this to be a very long-term solution, so want to do it right first time!
Any other opinions on XFS vs Reiserfs with Dovecot maildir?
Thanks again!
Richard.
ext3 is mature but IMHO completely unsuitable for a busy mail server or any situation where you'll have a bajillion of files in one directory. The exact point at which ext3 will screw you over obviously depends on many factors. But when it happens it'll probably be painful to reformat to something better.
~Seth
This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email
-- Richard Hobbs (IT Specialist) Toshiba Research Europe Ltd. - Cambridge Research Laboratory Email: richard.hobbs@crl.toshiba.co.uk Web: http://www.toshiba-europe.com/research/ Tel: +44 1223 436999 Mobile: +44 7811 803377
On Fri, 2009-05-15 at 09:35 +1000, Phillip Macey wrote:
We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails). (Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels)
Do you have POP3 users? What clients do your users typically use?
BTW. Kind of nasty to hijack a pretty much unrelated thread to your question..
On 15/05/2009 9:49 AM, Timo Sirainen wrote:
On Fri, 2009-05-15 at 09:35 +1000, Phillip Macey wrote:
We are having some performancec issues on our server at the moment - all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some of our users (20-30k emails). (Performance issues == everything is running ok then all of a sudden load avg goes through the roof, system cpu time goes crazy. Reading mail grinds to a halt. Then everything recovers just as suddenly and the load avg gradually returns to normal levels)Do you have POP3 users? What clients do your users typically use?
50% pop. 50% imap.. Roughly.
BTW. Kind of nasty to hijack a pretty much unrelated thread to your question..
Apologies, I wasn't intending it to be a hijack. Steffen said he had no problems. I was only making trying to point out that maildir can also have its problems as well. I dont know how MBox would handle the same situation because we dont have any MBox storage.
I may post another thread if we still have problems after I have finished tidying up some peoples maildirs.
-- Thanks, Phill Macey
On May 14, 2009, at 8:19 PM, Phillip Macey wrote:
On 15/05/2009 9:49 AM, Timo Sirainen wrote:
On Fri, 2009-05-15 at 09:35 +1000, Phillip Macey wrote:
We are having some performancec issues on our server at the moment
- all I can put it down to is the large size of some maildirs. Eg.
ls -ld Maildir/cur
might show a directory >20Mb in size for some
of our users (20-30k emails). (Performance issues == everything is running ok then all of a
sudden load avg goes through the roof, system cpu time goes crazy.
Reading mail grinds to a halt. Then everything recovers just as
suddenly and the load avg gradually returns to normal levels)Do you have POP3 users? What clients do your users typically use?
50% pop. 50% imap.. Roughly.
With POP3 users those kinds of I/O bursts can happen when a lot of
messages don't have their virtual size known. See Maildir performance
in http://wiki.dovecot.org/POP3Server
On 15/05/2009 10:33 AM, Timo Sirainen wrote:
On May 14, 2009, at 8:19 PM, Phillip Macey wrote:
On 15/05/2009 9:49 AM, Timo Sirainen wrote:
Do you have POP3 users? What clients do your users typically use?
50% pop. 50% imap.. Roughly.
With POP3 users those kinds of I/O bursts can happen when a lot of messages don't have their virtual size known. See Maildir performance in http://wiki.dovecot.org/POP3Server Ok. Thanks. I will have a read and keep that in mind ;-)
Richard Hobbs wrote:
That's also good to know... i like to do a job right instead of relying on faster hardware, as i'm sure you all do too, but it's good to know that if i make one or two "non-optimal" choices along the way, it'll probably be lightning fast anyway!
The main complaint we have from users is that their IMAP Inbox, with 5000 emails in it takes ages to appear, and no amount of coaxing will convince them to split their inbox into multiple folders.
Most of my mailing list folders that I work with on a daily basis like NANOG (over 20,000) and cisco-nsp (over 35,000) are no slower than folders with a handful of messages in them. It's not a private server either, it's the same one my customers use. If you're seeing a slowdown at 5k messages, either your server is woefully underpowered or something isn't quite right.
~Seth
I'd point out that the big *practical* issue with mbox is the reality of
big inboxes. While you can restrict the hoi polloi to something limited
like a quota of under 60MB (and remember that inbox is one big honking
file), the powers that be will not allow themselves to be so
limited...nor will they be particularly good about cleaning up. I dunno
how it is with you and your hardware/OS implementation, but there is a
serious CPU hit when somebody with a 1GB inbox (one big file, remember)
deletes a message...or gets new mail...or searches their inbox (I call
this the python swallowing the pig). The first two will be trivial when
we switch to maildir.
OTOH, boy is it quick to do a backup with mbox. I dread that part of
our move from mbox to maildir format. We will probably go from 2 hours
to a day in the switch from 3000 inboxes of one file each (mbox) to
3000 directories with hundreds or thousands of files in each (maildir).
Pick your poison
Timo Sirainen wrote:
On May 13, 2009, at 9:57 AM, Richard Hobbs wrote:
OK... so Dovecot is certainly significantly faster that uw-imapd in both cases, but is dovecot fastest with mbox or maildir? I would assume maildir, but you never know...
It's not that simple to answer. With mbox it's probably faster to read through all mails, because they're in a single file. With Maildir it's faster to delete mails, because it only needs to delete a single file, instead of moving data around in the mbox file. But Maildir has less problems and it's much less likely to get corrupted, so even if mbox performance would be better in some cases I'd recommend Maildir.
-- "One must think like a hero to behave like a merely decent human being."
- May Sarton Stewart Dean, Unix System Admin, Bard College, New York 12504 sdean@bard.edu voice: 845-758-7475, fax: 845-758-7035
Richard Hobbs wrote:
My colleague has mentioned something of interest... can dovecot keep the index files in RAM? If so, the performance will obviously be *so* much better than running them off the hard disks.
My understanding was that in-memory indicies are discarded on logout. They're of benefit generally in cases where (a) you are distributing your servers and don't want to risk indicies over NFS, or (b) your users are doing POP drive-bys, and not storing on the server (though I thought there was a "don't bother indexing at all" for this).
This also raises questions about what happens if the machine is powered off etc... but it's UPSd etc... so if it were to rebuild it's indexes every time it was booted up, that wouldn't be the end of the world.
Dovecot will rebuild its indices when it needs to -- either they're not there, or they're deemed invalid (corrupted, etc). Except in the case of dbox, they are a bonus only -- not essential.
You could even reduce your backup footprint by omitting them.
-- Curtis Maloney cmaloney@cardgate.net
I don't know whether this would help with the migration, but I routinely solve a similar problem. I have implemented mail failover between two servers -- which are configured with identical sets of mailboxes -- and every 10 minutes or so, a script grabs any E-Mails from the other server and stores them locally in the proper mailboxes. This script relies on IMAP connectivity and passwords in plaintext in a passwd-file. Here it is:
#!/usr/bin/perl
require '/vmail/Simple.pm';
use Net::SMTP;
$passwdfile = "/path/to/passwd-file"; $remoteserver = "myotherserver.com";
open(FIN, $passwdfile); @pwlines = (<FIN>); close(FIN);
for ($i = 0; $i <= $#pwlines; $i++) { @pwflds = split(/[:]/, $pwlines[$i]); if ($#pwflds == 3) { @pwparts = split(/[}]/, $pwflds[1]); if ($#pwparts == 1) { $thislogin = $pwflds[0]; $thispassword = $pwparts[1]; transfermail($remoteserver, $thislogin, $thispassword); } } }
sub transfermail { my $remoteserver = $_[0]; my $thislogin = $_[1]; my $thispassword = $_[2];
print $remoteserver, " - ", $thislogin, " - ", $thispassword, "\n";
my $server = new Net::IMAP::Simple( $remoteserver );
my $login_status = $server->login( $thislogin, $thispassword ); if ($login_status) { my $number_of_messages = $server->select("INBOX"); print $thislogin, " - ", $number_of_messages, " messages.\n";
my $msg;
for ($msg = 1; $msg <= $number_of_messages; $msg++)
{
$ok = 1;
$lines = $server->get( $msg ) or $ok = 0;
$smtp = Net::SMTP->new("127.0.0.1") or $ok = 0;
$smtp->mail($thislogin) or $ok = 0;
$smtp->recipient($thislogin) or $ok = 0;
$smtp->data() or $ok = 0;
$smtp->datasend(@$lines) or $ok = 0;
$smtp->dataend() or $ok = 0;
$smtp->quit or $ok = 0;
if ($ok)
{
$server->delete( $msg );
}
}
} else { print $thislogin, " - could not log in.\n"; }
$server->quit();
}
1;
Blast, forgot something: the "Simple.pm" referenced in the script is this thing:
http://search.cpan.org/~jpaf/Net-IMAP-Simple-0.93/Simple.pm
Download it, compile it, put it somewhere that the script can find it.
I don't know whether this would help with the migration, but I routinely solve a similar problem. I have implemented mail failover between two servers -- which are configured with identical sets of mailboxes -- and every 10 minutes or so, a script grabs any E-Mails from the other server and stores them locally in the proper mailboxes. This script relies on IMAP connectivity and passwords in plaintext in a passwd-file. Here it is:
#!/usr/bin/perl
require '/vmail/Simple.pm';
use Net::SMTP;
$passwdfile = "/path/to/passwd-file"; $remoteserver = "myotherserver.com";
open(FIN, $passwdfile); @pwlines = (<FIN>); close(FIN);
for ($i = 0; $i <= $#pwlines; $i++) { @pwflds = split(/[:]/, $pwlines[$i]); if ($#pwflds == 3) { @pwparts = split(/[}]/, $pwflds[1]); if ($#pwparts == 1) { $thislogin = $pwflds[0]; $thispassword = $pwparts[1]; transfermail($remoteserver, $thislogin, $thispassword); } } }
sub transfermail { my $remoteserver = $_[0]; my $thislogin = $_[1]; my $thispassword = $_[2];
print $remoteserver, " - ", $thislogin, " - ", $thispassword, "\n";
my $server = new Net::IMAP::Simple( $remoteserver );
my $login_status = $server->login( $thislogin, $thispassword ); if ($login_status) { my $number_of_messages = $server->select("INBOX"); print $thislogin, " - ", $number_of_messages, " messages.\n";
my $msg;
for ($msg = 1; $msg <= $number_of_messages; $msg++) {
$ok = 1; $lines = $server->get( $msg ) or $ok = 0; $smtp = Net::SMTP->new("127.0.0.1") or $ok = 0; $smtp->mail($thislogin) or $ok = 0; $smtp->recipient($thislogin) or $ok = 0; $smtp->data() or $ok = 0; $smtp->datasend(@$lines) or $ok = 0; $smtp->dataend() or $ok = 0; $smtp->quit or $ok = 0; if ($ok) { $server->delete( $msg ); }
} } else { print $thislogin, " - could not log in.\n"; }
$server->quit();
}
1;
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tue, May 12, 2009 at 10:09:06AM +0000, Richard Hobbs wrote:
Hello,
[...]
That'd good to know. Do you happen to know where I can get a copy of this "external script" you speak of? Will it simply be included in the debian package (probably)?
| tomas@floh:~$ apt-file search convert-tool | dovecot-common: /usr/bin/convert-tool
Seems to be there :-)
HTH
- -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFKCXZKBcgs9XrR2kYRAoSyAJ4omHYXeXcOGoy0i64ZAf7fOOlF9gCcCYnQ tI3hPvMdhCxJ3z8y3LlMBa4= =aXxn -----END PGP SIGNATURE-----
Richard Hobbs wrote:
Hi All,
We are soon to migrate our mail server from one piece of hardware to another and we would like to take this opportunity to optimize things.
Can I recommend you add virtualisation to your todo list. I use linux-vserver, but there are plenty other ideas out there.
It's just superb that later on you can migrate services between physical hardware with MUCH less hassle. You can easily test upgrades in a sandbox first. etc
I personally split all the tasks into different virtual servers. Right now I actually still have quite a few mail related services in a single vserver, but ideally you would split everything up and then later if you needed to upgrade a single service or move it to a new machine it would have minimal effect.
Your general process sounds about right though - I think it may be possible for you to preserve pop uidls, but see the wiki for more notes on that
Good luck
Ed W
participants (14)
-
Charles Marcus
-
Curtis Maloney
-
Ed W
-
Giuliano Gavazzi
-
Jakob Hirsch
-
Lou Duchez
-
Phillip Macey
-
Richard Hobbs
-
Scott Silva
-
Seth Mattinen
-
Steffen Kaiser
-
Stewart Dean
-
Timo Sirainen
-
tomas@tuxteam.de