[Dovecot] Howto add another disk storage

newer
[Dovecot] mailboxes missing after...

Adrian Minta

4 Jul 2012 4 Jul '12

11:01 p.m.

Hi all What is the best strategy to add another storage to an existing virtual mail system ? Move some domains to the new storage and create symlinks ? Switch to dovecot hashing ? But in this case what is the easy-east way to migrate ?

Thanks for any suggestions or tips !

Show replies by date

J E Lyon

4 Jul 4 Jul

11:22 p.m.

On 4 Jul 2012, at 21:01, Adrian Minta wrote:

...

What is the best strategy to add another storage to an existing virtual mail system ? Move some domains to the new storage and create symlinks ? Switch to dovecot hashing ? But in this case what is the easy-east way to migrate ?

Thanks for any suggestions or tips !

Are you using Volume Management (VLM) on the system, or do you have regular partitions mounted? Is there any RAID or other factors to consider . . in fact, a few details about your system might help :)

~ James.

Adrian Minta

5 Jul 5 Jul

12:09 a.m.

On 07/04/12 23:22, J E Lyon wrote:

...

On 4 Jul 2012, at 21:01, Adrian Minta wrote:

...
What is the best strategy to add another storage to an existing virtual mail system ? Move some domains to the new storage and create symlinks ? Switch to dovecot hashing ? But in this case what is the easy-east way to migrate ?

Thanks for any suggestions or tips ! Are you using Volume Management (VLM) on the system, or do you have regular partitions mounted? Is there any RAID or other factors to consider . . in fact, a few details about your system might help :)

~ James.

No LVM and the RAID is done in the SAN appliance. My gut tells me that formatting a single huge 25TB partition is not the best way to go. That's why I consider having two different LUN's on two different SAN's mounted together as two directories. A fsck and a recovery will work faster. The only issue is distributing maildirs. across the two file systems.

J E Lyon

12:25 a.m.

On 4 Jul 2012, at 22:09, Adrian Minta wrote:

...

On 07/04/12 23:22, J E Lyon wrote:

...
On 4 Jul 2012, at 21:01, Adrian Minta wrote:

...
What is the best strategy to add another storage to an existing virtual mail system ? Move some domains to the new storage and create symlinks ? Switch to dovecot hashing ? But in this case what is the easy-east way to migrate ?

Thanks for any suggestions or tips ! Are you using Volume Management (VLM) on the system, or do you have regular partitions mounted? Is there any RAID or other factors to consider . . in fact, a few details about your system might help :)

~ James.

No LVM and the RAID is done in the SAN appliance. My gut tells me that formatting a single huge 25TB partition is not the best way to go. That's why I consider having two different LUN's on two different SAN's mounted together as two directories. A fsck and a recovery will work faster. The only issue is distributing maildirs. across the two file systems.

LVM would help with all the above -- sounds like it's not an option though.

I have seen recommendations for Maildir structures whereby the first letter of the user account is one of 26 (or however many) directories, and the accounts therefore fall into one of several different directories -- you can distribute the letters of the alphabet directories across the two filesystems . .

Have you seen the configuration examples that use parameters to specify the Maildir locations based on the account name etc.?

~ James.

Stan Hoeppner

7:37 a.m.

On 7/4/2012 4:09 PM, Adrian Minta wrote:

...

On 07/04/12 23:22, J E Lyon wrote:

...
On 4 Jul 2012, at 21:01, Adrian Minta wrote:

...
What is the best strategy to add another storage to an existing virtual mail system ? Move some domains to the new storage and create symlinks ? Switch to dovecot hashing ? But in this case what is the easy-east way to migrate ?

Thanks for any suggestions or tips ! Are you using Volume Management (VLM) on the system, or do you have regular partitions mounted? Is there any RAID or other factors to consider . . in fact, a few details about your system might help :)

~ James.

No LVM and the RAID is done in the SAN appliance.

It absolutely kills me every time I see a mail server admin display almost total lack of knowledge of his/her storage back end, or the inability to describe it technically, in an email...

...

My gut tells me that formatting a single huge 25TB partition is not the best way to go.

What is this statement supposed to convey to us? It makes no sense.

...

That's why I consider having two different LUN's on two different SAN's mounted together as two directories.

And this doesn't make sense either.

...

A fsck and a recovery will work faster.

If you're having to run fsck on your filesystem on a regular basis, you've have larger problems than storage provisioning. What filesystem are you using?

...

The only issue is distributing maildirs. across the two file systems.

Here's the type of information and level of detail you need to provide for any of us to intelligently assist you with this issue:

My SAN array is model X from company Y. It has 12x 3TB drives configured as a RAID6 array. I have created a 5TB virtual drive and exported it as LUN #X. It houses the filesystem (extX/XFS/JFS) that contains the mail store, which is now getting full. I have 25TB of the 30TB net array space unallocated. How can I best use this space to increase storage space for Dovecot?

To which I would answer:

Unless your SAN array is an el cheap model, you should be able to simply increase the size of the current virtual drive by adding a portion of the 25TB of unallocated space. Say you expand the virtual drive by 10TB. The virtual drive is already exported as LUN #X, so the host to which the LUN is unmasked simply now sees /dev/sdX as being 15TB total size instead of 5TB. At this point, simply grow the filesystem across the additional 10TB of the "disk". This negates the need for multiple filesystems/namespaces and jumping through hoops to balance your maildir workload across two separate filesystems.

This "on the fly" expansion capability is one of the biggest selling points of SAN technology. Surely your unit has such capability.

-- Stan

Adrian M

10:44 a.m.

On Thu, Jul 5, 2012 at 7:37 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...

On 7/4/2012 4:09 PM, Adrian Minta wrote:

...
On 07/04/12 23:22, J E Lyon wrote:

...
On 4 Jul 2012, at 21:01, Adrian Minta wrote:

...
What is the best strategy to add another storage to an existing virtual mail system ? Move some domains to the new storage and create symlinks ? Switch to dovecot hashing ? But in this case what is the easy-east way to migrate ?

Thanks for any suggestions or tips ! Are you using Volume Management (VLM) on the system, or do you have regular partitions mounted? Is there any RAID or other factors to consider . . in fact, a few details about your system might help :)

~ James.

No LVM and the RAID is done in the SAN appliance.

It absolutely kills me every time I see a mail server admin display almost total lack of knowledge of his/her storage back end, or the inability to describe it technically, in an email...

...
My gut tells me that formatting a single huge 25TB partition is not the best way to go.

What is this statement supposed to convey to us? It makes no sense.

...
That's why I consider having two different LUN's on two different SAN's mounted together as two directories.

And this doesn't make sense either.

...
A fsck and a recovery will work faster.

If you're having to run fsck on your filesystem on a regular basis, you've have larger problems than storage provisioning. What filesystem are you using?

...
The only issue is distributing maildirs. across the two file systems.

Here's the type of information and level of detail you need to provide for any of us to intelligently assist you with this issue:

My SAN array is model X from company Y. It has 12x 3TB drives configured as a RAID6 array. I have created a 5TB virtual drive and exported it as LUN #X. It houses the filesystem (extX/XFS/JFS) that contains the mail store, which is now getting full. I have 25TB of the 30TB net array space unallocated. How can I best use this space to increase storage space for Dovecot?

To which I would answer:

Unless your SAN array is an el cheap model, you should be able to simply increase the size of the current virtual drive by adding a portion of the 25TB of unallocated space. Say you expand the virtual drive by 10TB. The virtual drive is already exported as LUN #X, so the host to which the LUN is unmasked simply now sees /dev/sdX as being 15TB total size instead of 5TB. At this point, simply grow the filesystem across the additional 10TB of the "disk". This negates the need for multiple filesystems/namespaces and jumping through hoops to balance your maildir workload across two separate filesystems.

This "on the fly" expansion capability is one of the biggest selling points of SAN technology. Surely your unit has such capability.

-- Stan

Hi Stan, I know how to add drives to the storage and how to grow the existing filesystem, but such big filesystems are somehow new to mainstream linux. Yes, I know some university out there already have pentabytes filesystems, but right now stable linux systems have trouble formatting ext4 partition over 16T. All this is telling me that is safer to have two or tree smaller filesystems than a big one. Dovecot has a nice feature for this "Directory hashing" http://wiki.dovecot.org/MailLocation/

What I don't know is a nice way to migrate from a single directory no hashing to more than one and hashing.

Timo Sirainen

10:59 a.m.

On 5.7.2012, at 10.44, Adrian M wrote:

...

All this is telling me that is safer to have two or tree smaller filesystems than a big one. Dovecot has a nice feature for this "Directory hashing" http://wiki.dovecot.org/MailLocation/

What I don't know is a nice way to migrate from a single directory no hashing to more than one and hashing.

Alternative to hashing is to simply return a "mail" or "home" setting from userdb pointing to your new mountpoint.

Stan Hoeppner

12:35 p.m.

On 7/5/2012 2:44 AM, Adrian M wrote:

...

Hi Stan, I know how to add drives to the storage and how to grow the existing filesystem, but such big filesystems are somehow new to mainstream linux. Yes, I know some university out there already have pentabytes filesystems, but right now stable linux systems have trouble formatting ext4 partition over 16T. All this is telling me that is safer to have two or tree smaller filesystems than a big one. Dovecot has a nice feature for this "Directory hashing" http://wiki.dovecot.org/MailLocation/

At 16TB+ scale with maildir you should be using XFS on kernel 3.x, not EXT4. Your performance will be significantly better, as in 30% or much more. The typical XFS filesystem in the wild today is 50TB+ and there are hundreds of XFS filesystems well over 100TB deployed around the world.

NASA has XFS filesystems of 380TB and 535TB, and also has multiple 1PB+ CXFS (cluster XFS) filesystems. 20TB is a tiny snack for XFS, 500TB is lunch, 1PB is a big supper. A single XFS can scale to 16 Exabytes, or 1 million terabytes, though the largest deployed so far that I'm aware of is NASA's 535TB XFS. It'll scale to anything you'll ever throw at it, and much more.

...

What I don't know is a nice way to migrate from a single directory no hashing to more than one and hashing.

It's a good time to migrate to XFS.

-- Stan

Kaya Saman

12:45 p.m.

On Thu, Jul 5, 2012 at 10:35 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...

On 7/5/2012 2:44 AM, Adrian M wrote:

...
Hi Stan, I know how to add drives to the storage and how to grow the existing filesystem, but such big filesystems are somehow new to mainstream linux. Yes, I know some university out there already have pentabytes filesystems, but right now stable linux systems have trouble formatting ext4 partition over 16T. All this is telling me that is safer to have two or tree smaller filesystems than a big one. Dovecot has a nice feature for this "Directory hashing" http://wiki.dovecot.org/MailLocation/

At 16TB+ scale with maildir you should be using XFS on kernel 3.x, not EXT4. Your performance will be significantly better, as in 30% or much more. The typical XFS filesystem in the wild today is 50TB+ and there are hundreds of XFS filesystems well over 100TB deployed around the world.

NASA has XFS filesystems of 380TB and 535TB, and also has multiple 1PB+ CXFS (cluster XFS) filesystems. 20TB is a tiny snack for XFS, 500TB is lunch, 1PB is a big supper. A single XFS can scale to 16 Exabytes, or 1 million terabytes, though the largest deployed so far that I'm aware of is NASA's 535TB XFS. It'll scale to anything you'll ever throw at it, and much more.

...
What I don't know is a nice way to migrate from a single directory no hashing to more than one and hashing.

It's a good time to migrate to XFS.

-- Stan

Other good operating systems for large filesystems are FreeBSD with either UFS2 or ZFS, or Solaris with ZFS.

But then one must think, do I really want to switch OS?

As a personal preference I would go FreeBSD route as dovecot is easily available through 'Ports'. Even with miniscule stroage (20GB) for my new Dovecot IMAP server here at work FreeBSD 8.2 x64 running on VMware using ZFS for the 'extra' 20GB disk is performing pretty well.

Both UFS2 and ZFS can be snapshotted although ZFS has some functionality which is really cool but one will need to know how to set it up properly. Hence Zeta Byte Filesystem :-)

UFS2 tends to perform faster.... however.

Regards,

Kaya

J E Lyon

12:48 p.m.

On 5 Jul 2012, at 10:45, Kaya Saman wrote:

...

But then one must think, do I really want to switch OS?

I heard a rumour that switching OS is sometimes harder than adding a mountpoint :)

Kaya Saman

12:55 p.m.

On Thu, Jul 5, 2012 at 10:48 AM, J E Lyon <role.Dovecot-Readers@jlassocs.com> wrote:

...

On 5 Jul 2012, at 10:45, Kaya Saman wrote:

...
But then one must think, do I really want to switch OS?

I heard a rumour that switching OS is sometimes harder than adding a mountpoint :)

J.

It can be!

That's why I'm not even thinking of migrating the mission critical stuff running on CentOS 5 to even CentOS 6 yet.

Am however, trying to do all clean installs on FreeBSD where I **can ** get away with it.

Ok this may sound incredibly sad so don't sue me for it, but for my OpenSource work at home I have switched over from 15+ Linux servers down to 1x FreeBSD system running Jails.

The statement about switching OS's was only a suggestion though with a rear-door retreat open incase it failed :-)

Regards,

Kaya

J E Lyon

1:01 p.m.

On 5 Jul 2012, at 10:55, Kaya Saman wrote:

...

That's why I'm not even thinking of migrating the mission critical stuff running on CentOS 5 to even CentOS 6 yet.

I'm in an identical position there -- and in fact, I think it's time to get some virtualised hosting of CentOS 6 servers, once I decide on the choice of underlying solution . . feels like a minefield at this point in time . . I've been working with RedHat then RH & Fedora then CentOS since the 90s, but haven't any firsthand experience of installing & maintaining VMWare or anything like it. Exciting times :)

...

Ok this may sound incredibly sad so don't sue me for it, but for my OpenSource work at home I have switched over from 15+ Linux servers down to 1x FreeBSD system running Jails.

Ah yes... see above :)

Kaya Saman

1:15 p.m.

On Thu, Jul 5, 2012 at 11:01 AM, J E Lyon <role.Dovecot-Readers@jlassocs.com> wrote:

...

On 5 Jul 2012, at 10:55, Kaya Saman wrote:

...
That's why I'm not even thinking of migrating the mission critical stuff running on CentOS 5 to even CentOS 6 yet.

I'm in an identical position there -- and in fact, I think it's time to get some virtualised hosting of CentOS 6 servers, once I decide on the choice of underlying solution . . feels like a minefield at this point in time . . I've been working with RedHat then RH & Fedora then CentOS since the 90s, but haven't any firsthand experience of installing & maintaining VMWare or anything like it. Exciting times :)

Wow it sounds nice :-)

I am still part of the old school thought trend in that everything should have it's own dedicated hardware and preferably be SPARC cpu based, but if I took that attitude at work or anywhere else I went, I would be thought of as a troll :-P (use my way or else.... ?? blaaaah)

...

...
Ok this may sound incredibly sad so don't sue me for it, but for my OpenSource work at home I have switched over from 15+ Linux servers down to 1x FreeBSD system running Jails.

Ah yes... see above :)

Vmware does not compare to FreeBSD and jails. Use FreeBSD becase blah blah blah.... haha sorry just thought I would inocently troll :-)

Yeah I agree, Vmware or Citrix Xen do make life easier to accomodate for the broad spectrum user running a mixture of OS's - like most people. It is definitely the way forward.

And as not to get too side tracked from the original content of the posting by the OP. It is worth making systems work and integrate together, meaning having a dedicated device for storage and then mounting that in whatever OS you run whichever service on.

I am not sure though if a remote file system would need to be locally provisioned.... as if created a pool on a SAN or NAS with (x) file system. The Dovecot server shouldn't really need to know anything else other then where on that FS to put the Mail boxes or dir's. The only thing that I could think of is that disk IO or network bandwidth may be a performance factor but that should really only be it.

Other then different file systems peforming in different ways of course.

...

J.

Regards,

Kaya

Wojciech Puchar

2:35 p.m.

...

Am however, trying to do all clean installs on FreeBSD where I **can ** get away with it.

right.

...

Ok this may sound incredibly sad so don't sue me for it, but for my OpenSource work at home I have switched over from 15+ Linux servers down to 1x FreeBSD system running Jails.

quite a common case.

Charles Marcus

1:33 p.m.

On 2012-07-05 5:45 AM, Kaya Saman <kayasaman@gmail.com> wrote:

...

FreeBSD 8.2 x64 running on VMware

Hi Kaya,

Do you (or anyone else) know of any decent VMWare images (appliance) of current version of FreeBSD? I've been debating on switching from Gentoo to FreeBSD for a while now, and would love to find a ready made appliance (just basic uncustomized server install) that I could start with...

Thanks,

Best regards,

Charles

Reindl Harald

1:37 p.m.

Am 05.07.2012 12:33, schrieb Charles Marcus:

...

On 2012-07-05 5:45 AM, Kaya Saman <kayasaman@gmail.com> wrote:

...
FreeBSD 8.2 x64 running on VMware

Hi Kaya,

Do you (or anyone else) know of any decent VMWare images (appliance) of current version of FreeBSD? I've been debating on switching from Gentoo to FreeBSD for a while now, and would love to find a ready made appliance (just basic uncustomized server install) that I could start with...

do you really think it is a good idea to start with a pre-installed FREE operating system instead doing a fresh install?

for testing maybe for production not really

Charles Marcus

6 Jul 6 Jul

12:26 p.m.

On 2012-07-05 6:37 AM, Reindl Harald <h.reindl@thelounge.net> wrote:

...

Am 05.07.2012 12:33, schrieb Charles Marcus:

...
On 2012-07-05 5:45 AM, Kaya Saman<kayasaman@gmail.com> wrote:

...
FreeBSD 8.2 x64 running on VMware

...

...
Do you (or anyone else) know of any decent VMWare images (appliance) of current version of FreeBSD? I've been debating on switching from Gentoo to FreeBSD for a while now, and would love to find a ready made appliance (just basic uncustomized server install) that I could start with...

...

do you really think it is a good idea to start with a pre-installed FREE operating system instead doing a fresh install?

do you really think it is a good idea to trash someone else's comments (without contributing anything at all I might add) based on pure ass-u-me-ptions of yours that have no basis in reality?

...

for testing maybe for production not really

Who said I was going to base my *production* server on such an image?

I was asking for something so I could easily get started testing and playing with it.

Do you *really* think I would simply start with a pre-built image and just switch my production to it without *lots* of testing (and getting used to the new environment, etc)?

Reindl Harald

12:46 p.m.

Am 06.07.2012 11:26, schrieb Charles Marcus:

...

On 2012-07-05 6:37 AM, Reindl Harald <h.reindl@thelounge.net> wrote:

...
do you really think it is a good idea to start with a pre-installed FREE operating system instead doing a fresh install?

do you really think it is a good idea to trash someone else's comments (without contributing anything at all I might add) based on pure ass-u-me-ptions of yours that have no basis in reality?

where do you see anything offending in my reply?

...

...
for testing maybe for production not really

Who said I was going to base my *production* server on such an image?

i do not know

but why use it at all if it is not intentent to use later?

...

I was asking for something so I could easily get started testing and playing with it

and in my opinion a pre-built image is the wrong start

you learn much more by installing and configure any OS by yourself as by use a pre-installed without have a clue why things are working and how they do

Charles Marcus

1:01 p.m.

On 2012-07-06 5:46 AM, Reindl Harald <h.reindl@thelounge.net> wrote:

...

where do you see anything offending in my reply?

Your tone is almost always offending, Reindl - and you quite often throw in a good dose of very offending cursing to boot (admittedly not this time though)... basically, I just don't like your general tone, it is *never* friendly, and more often than not it is unhelpful (like this time).

...

but why use it at all if it is not intentent to use later?

To get up and running quickly and do some quick testing to see if i want to invest more serious time on it?

The last time I tried to install freebsd from scratch (version 4/5 days) it was a horrendous episode that turned me off to freebsd right away. I want to avoid a repeat of that and start with a *working* system to see what i may have been missing.

Seriously, for questions like this, either post a responsive answer, or just don't answer at all.

...

and in my opinion a pre-built image is the wrong start

You know what they say about opinions...

...

you learn much more by installing and configure any OS by yourself as by use a pre-installed without have a clue why things are working and how they do

Reindl - i'm a gentoo user, I know all about the advantages of installing from scratch.

Again - just please stay silent if you don't have anything positive to contribute - and yes, you often do actually contribute positive things, and you definitely have some knowledge to share, but again, your tone and manner are almost always those of a know-it-all spoiled teenage brat, and I personally have had enough, hence my possibly overly harsh response to your know-it-all-non-response.

Reindl Harald

1:16 p.m.

Am 06.07.2012 12:01, schrieb Charles Marcus:

...

Again - just please stay silent if you don't have anything positive to contribute - and yes, you often do actually contribute positive things, and you definitely have some knowledge to share, but again, your tone and manner are almost always those of a know-it-all spoiled teenage brat, and I personally have had enough, hence my possibly overly harsh response to your know-it-all-non-response.

interesting that with the same tone in real life in more than 30 years nobody had a problem as long it was not my intention

however - i googled "freebsd vmware image" for you the first link leads to http://www.thoughtpolice.co.uk/vmware/

http://www.thoughtpolice.co.uk/vmware/#freebsd

in VMware vCenter yxou can installa appliance directly while it is downloaded via the second entry in the file-menu

Wojciech Puchar

7:41 p.m.

...

do you really think it is a good idea to trash someone else's comments (without contributing anything at all I might add) based on pure ass-u-me-ptions of yours that have no basis in reality?

Do you hate yourself of not being able to understand normal response and so - getting agressive against people?

Simon Brereton

7:59 p.m.

On 6 July 2012 12:41, Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl> wrote:

...

...
do you really think it is a good idea to trash someone else's comments (without contributing anything at all I might add) based on pure ass-u-me-ptions of yours that have no basis in reality?

Do you hate yourself of not being able to understand normal response and so

getting agressive against people?

Pot. Kettle. Black.

Kaya Saman

5 Jul 5 Jul

1:38 p.m.

On Thu, Jul 5, 2012 at 11:33 AM, Charles Marcus <CMarcus@media-brokers.com> wrote:

...

On 2012-07-05 5:45 AM, Kaya Saman <kayasaman@gmail.com> wrote:

...
FreeBSD 8.2 x64 running on VMware

Hi Kaya,

Do you (or anyone else) know of any decent VMWare images (appliance) of current version of FreeBSD? I've been debating on switching from Gentoo to FreeBSD for a while now, and would love to find a ready made appliance (just basic uncustomized server install) that I could start with...

Thanks,

--

Best regards,

Charles

Hi Charles,

this is actually offtopic from the OP however, feel free to PM with any questions you have :-)

However, in response I didn't use any images, just the simple FreeBSD 8.2 AMD64 ISO..... and installed from there.

Regards,

Kaya

Wojciech Puchar

1:42 p.m.

...

this is actually offtopic from the OP however, feel free to PM with any questions you have :-)

However, in response I didn't use any images, just the simple FreeBSD 8.2 AMD64 ISO..... and installed from there.

and it will work on many VM systems. And works best without any VM overlay.

Ed W

7 Jul 7 Jul

9:36 p.m.

On 05/07/2012 11:33, Charles Marcus wrote:

...

On 2012-07-05 5:45 AM, Kaya Saman <kayasaman@gmail.com> wrote:

...
FreeBSD 8.2 x64 running on VMware

Hi Kaya,

Do you (or anyone else) know of any decent VMWare images (appliance) of current version of FreeBSD? I've been debating on switching from Gentoo to FreeBSD for a while now, and would love to find a ready made appliance (just basic uncustomized server install) that I could start with...

We use Gentoo host + Linux-vservers (+grsec/pax) and very satisfied.
Linux-vservers gives you something similar to "jails", although it's meant to look a little more like full virtualisation than jails does (bear in mind I don't have "jails" experience though)

There are plenty of tools included with linux-vservers to clone, build and maintain your individual "machines". It's a complete enough "virtualisation" that you can say boot a centos image under your gentoo host (or whatever). It's also extremely lightweight, so there is almost zero overhead. It's a little weak on areas where you need direct access to hardware, but there are generally acceptable workarounds - also you can't run completely different operating systems since it's not a full virtualisation solution

One nice benefit is that all images are just a directory containing your linux installation, so it's very easy to backup/snapshot/restore/drop in and fix something you bolloxed up/clone to a new machine.

Just my 2p.

Cheers

Ed W

Adrian M

5 Jul 5 Jul

1:10 p.m.

On Thu, Jul 5, 2012 at 12:35 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...

On 7/5/2012 2:44 AM, Adrian M wrote:

...
Hi Stan, I know how to add drives to the storage and how to grow the existing filesystem, but such big filesystems are somehow new to mainstream linux. Yes, I know some university out there already have pentabytes filesystems, but right now stable linux systems have trouble formatting ext4 partition over 16T. All this is telling me that is safer to have two or tree smaller filesystems than a big one. Dovecot has a nice feature for this "Directory hashing" http://wiki.dovecot.org/MailLocation/

At 16TB+ scale with maildir you should be using XFS on kernel 3.x, not EXT4. Your performance will be significantly better, as in 30% or much more. The typical XFS filesystem in the wild today is 50TB+ and there are hundreds of XFS filesystems well over 100TB deployed around the world.

NASA has XFS filesystems of 380TB and 535TB, and also has multiple 1PB+ CXFS (cluster XFS) filesystems. 20TB is a tiny snack for XFS, 500TB is lunch, 1PB is a big supper. A single XFS can scale to 16 Exabytes, or 1 million terabytes, though the largest deployed so far that I'm aware of is NASA's 535TB XFS. It'll scale to anything you'll ever throw at it, and much more.

...
What I don't know is a nice way to migrate from a single directory no hashing to more than one and hashing.

It's a good time to migrate to XFS.

-- Stan

Hi Stan, I already have xfs, and kernel 3.x. Switching from kernel 2.6.x to 3.x a couple months ago did indeed decrease the iops number.

Wojciech Puchar

2:36 p.m.

...

At 16TB+ scale with maildir you should be using XFS on kernel 3.x, not EXT4. Your performance will be significantly better, as in 30% or much

why you want to make 16TB partition at first place?

Stan Hoeppner

6 Jul 6 Jul

7:04 a.m.

On 7/5/2012 6:36 AM, Wojciech Puchar wrote:

...

...
At 16TB+ scale with maildir you should be using XFS on kernel 3.x, not EXT4. Your performance will be significantly better, as in 30% or much

why you want to make 16TB partition at first place?

You wouldn't partition the large LUN. You'd simply directly format it with XFS. Laying a partition table on it would introduce the real possibility of filesystem misalignment on the RAID stripe if one wasn't extremely careful about the partitioning alignment.

But your question seems to relate to reason for using a single very large filesystem. In which case the answer is typically a single file tree/name space. EXT3/4 and Reiser apparently aren't super reliable at very large sizes. XFS has no such negative issues at large scale, and in fact was specifically designed for extremely large scale. As I mentioned previously, 50TB+ XFS filesystems are mundane and there are many 100TB+ in the wild.

As with any small scale filesystem, you'll need a backup system capable of handling your very large XFS filesystem and a proper strategy.

-- Stan

Wojciech Puchar

10:16 a.m.

...

You wouldn't partition the large LUN. You'd simply directly format it with XFS. Laying a partition table on it would introduce the real

Fine. i understand that. What i am suggesting is not making large LUNs. you get the best performance with directly attaching disks to your machine.

Stan Hoeppner

7 Jul 7 Jul

9:37 a.m.

On 7/6/2012 2:16 AM, Wojciech Puchar wrote:

...

...
You wouldn't partition the large LUN. You'd simply directly format it with XFS. Laying a partition table on it would introduce the real

Fine. i understand that. What i am suggesting is not making large LUNs. you get the best performance with directly attaching disks to your machine.

That's simply not true. 99% of block latency is rotational. iSCSI packet latency is an order of magnitude less than rotational latency, and fiber channel packet latency is a couple orders of magnitude lower still, even with a couple switches between the server and storage array.

Thus it doesn't matter how you attach your mechanical disks to a server, as the latency is in the drives, not the data path.

-- Stan

Wojciech Puchar

12:23 p.m.

...

...
Fine. i understand that. What i am suggesting is not making large LUNs. you get the best performance with directly attaching disks to your machine.

That's simply not true. 99% of block latency is rotational. iSCSI It's not about iSCSI latency and overhead.

It's about other things i just don't want to explain anymore for people that just have to much money to spent to make things more complex and less efficient. It's just stupid, but it's OT so EOT.

Reindl Harald

12:36 p.m.

Am 07.07.2012 11:23, schrieb Wojciech Puchar:

...

...
...
Fine. i understand that. What i am suggesting is not making large LUNs. you get the best performance with directly attaching disks to your machine.

That's simply not true. 99% of block latency is rotational. iSCSI It's not about iSCSI latency and overhead.

It's about other things i just don't want to explain anymore for people that just have to much money to spent to make things more complex and less efficient. It's just stupid, but it's OT so EOT

what are you trying to tell us here?

do you REALLY believe you local disks can outperform a SAN storage with 1 GB dedicated buffer cache und a DEDUCATED 1400 MHz CPU which is only optimized for one task: disk performance

you lcal storage has to fight for CPU and memory all teh time with other applications (caching etc.)

to believe under really high load a local storage is faster at the end is bullshit!

Wojciech Puchar

12:43 p.m.

...

SAN storage with 1 GB dedicated buffer cache und a DEDUCATED 1400 MHz CPU which is only optimized for one task: disk performance

you lcal storage has to fight for CPU and memory all teh time with other applications (caching etc.)

to believe under really high load a local storage is faster at the end is bullshit! Yes i do believe in that "bullshit". As well as i believe that no amount of money for "professional hardware" will replace brain.

Reindl Harald

1 p.m.

Am 07.07.2012 11:43, schrieb Wojciech Puchar:

...

...
SAN storage with 1 GB dedicated buffer cache und a DEDUCATED 1400 MHz CPU which is only optimized for one task: disk performance

you lcal storage has to fight for CPU and memory all teh time with other applications (caching etc.)

to believe under really high load a local storage is faster at the end is bullshit!

...

Yes i do believe in that "bullshit"

you problem

...

As well as i believe that no amount of money for "professional hardware" will replace brain

this is a laughable argumentation because your brain can not be so much bigger than others because you will not show me as example VMware HA on a local storage

and yes, tehre are very very few cases where it makes sense this days to not use virtualization for practically all

if you have a SPECIAL workload that may be different but to believe you have more brain because you never were in the position to be responsible for ANY infrastructure component (hardware and osftware) is arrogant without any background

Steve Litt

8 Jul 8 Jul

10:27 a.m.

On Sat, 07 Jul 2012 11:36:02 +0200, Reindl Harald said:

...

Am 07.07.2012 11:23, schrieb Wojciech Puchar:

...
...
...
Fine. i understand that. What i am suggesting is not making large LUNs. you get the best performance with directly attaching disks to your machine.

That's simply not true. 99% of block latency is rotational. iSCSI It's not about iSCSI latency and overhead.

It's about other things i just don't want to explain anymore for people that just have to much money to spent to make things more complex and less efficient. It's just stupid, but it's OT so EOT

what are you trying to tell us here?

do you REALLY believe you local disks can outperform a SAN storage with 1 GB dedicated buffer cache und a DEDUCATED 1400 MHz CPU which is only optimized for one task: disk performance

you lcal storage has to fight for CPU and memory all teh time with other applications (caching etc.)

to believe under really high load a local storage is faster at the end is bullshit!

Can one even argue on one side or the other without knowing the speed of the network, and how much contention is on that network?

My experience is that with a 100Mbs network, local is faster, although I've never had a SAN, so to speak, on the other end.

The specification of a SATA rev 3 is 6Gbs, which is a heck of a lot faster than 1Gbs per second spec of a gigabit network. Both have a lot of things slowing them from their spec, but I'd need to see some proof of an assertion that anything coming in over a 1Gbs wire can beat a SATA rev3 local disk.

Thanks

SteveT

Steve Litt * http://www.troubleshooters.com/ * http://twitter.com/stevelitt Troubleshooting Training * Human Performance

Steve Litt

10:36 a.m.

On Sun, 8 Jul 2012 03:27:55 -0400, Steve Litt said:

...

On Sat, 07 Jul 2012 11:36:02 +0200, Reindl Harald said:

...
Am 07.07.2012 11:23, schrieb Wojciech Puchar:

...
...
...
Fine. i understand that. What i am suggesting is not making large LUNs. you get the best performance with directly attaching disks to your machine.

That's simply not true. 99% of block latency is rotational. iSCSI It's not about iSCSI latency and overhead.

It's about other things i just don't want to explain anymore for people that just have to much money to spent to make things more complex and less efficient. It's just stupid, but it's OT so EOT

what are you trying to tell us here?

do you REALLY believe you local disks can outperform a SAN storage with 1 GB dedicated buffer cache und a DEDUCATED 1400 MHz CPU which is only optimized for one task: disk performance

you lcal storage has to fight for CPU and memory all teh time with other applications (caching etc.)

to believe under really high load a local storage is faster at the end is bullshit!

Can one even argue on one side or the other without knowing the speed of the network, and how much contention is on that network?

My experience is that with a 100Mbs network, local is faster, although I've never had a SAN, so to speak, on the other end.

The specification of a SATA rev 3 is 6Gbs, which is a heck of a lot faster than 1Gbs per second spec of a gigabit network. Both have a lot of things slowing them from their spec, but I'd need to see some proof of an assertion that anything coming in over a 1Gbs wire can beat a SATA rev3 local disk.

This isn't really a fair comparision because I don't think rev3 is commodity yet. So Rev 2, which IS commodity, is 3Gbs, which is still considerably faster than the wire on a Gigabit network.

Thanks

SteveT

Steve Litt * http://www.troubleshooters.com/ * http://twitter.com/stevelitt Troubleshooting Training * Human Performance

J E Lyon

11:28 a.m.

On 8 Jul 2012, at 08:36, Steve Litt wrote:

...

...
Can one even argue on one side or the other without knowing the speed of the network, and how much contention is on that network?

My experience is that with a 100Mbs network, local is faster, although I've never had a SAN, so to speak, on the other end.

The specification of a SATA rev 3 is 6Gbs, which is a heck of a lot faster than 1Gbs per second spec of a gigabit network. Both have a lot of things slowing them from their spec, but I'd need to see some proof of an assertion that anything coming in over a 1Gbs wire can beat a SATA rev3 local disk.

This isn't really a fair comparision because I don't think rev3 is commodity yet. So Rev 2, which IS commodity, is 3Gbs, which is still considerably faster than the wire on a Gigabit network.

I think there are optimal situations where any configuration looks good . . How often can a real-world disk actually deliver the 6Gbs when only a minority of disk reads are long sequential runs on the platters?

That's why I take the broader view . . over the course of 2-3 years, say, with a range of applications and demands, the total impact of resources, reliability, management and end-user experience with large systems that are large and complex by virtue of demand rather than through any fault of design, may be better served by a high-performance storage solution connected with extremely high-speed dedicated channels (I don't think anyone ever suggested 100mbps network for SAN was a high-performance scenario) . . and then look at how things work overall, including hardware maintenance for example.

So, the laboratory / theoretical throughput of an internal 6Gbs bus is only partly a factor, imho...

Wojciech Puchar

11:38 a.m.

...

I think there are optimal situations where any configuration looks good . . How often can a real-world disk actually deliver the 6Gbs when only a minority of disk reads are long sequential runs on the platters? none of hard drives can saturate 1.5Gb/s

Patrick Domack

4:27 p.m.

Quoting Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>:

...

...
I think there are optimal situations where any configuration looks
good . . How often can a real-world disk actually deliver the 6Gbs
when only a minority of disk reads are long sequential runs on the
platters? none of hard drives can saturate 1.5Gb/s

There are many disks out that do 150-200MB/sec, easily exceeding
1.5gb/s speeds.

Stan Hoeppner

9 Jul 9 Jul

12:29 a.m.

On 7/8/2012 8:27 AM, Patrick Domack wrote:

...

Quoting Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>:

...
...
I think there are optimal situations where any configuration looks good . . How often can a real-world disk actually deliver the 6Gbs when only a minority of disk reads are long sequential runs on the platters? none of hard drives can saturate 1.5Gb/s

There are many disks out that do 150-200MB/sec, easily exceeding 1.5gb/s speeds.

There are a few SAS drives that can saturate a 150MB/s link, such as the Seagate Cheetah 15k.7, which can sustain 204MB/s streaming read on the outer tracks.

But, again, streaming rate is irrelevant to mail storage. What matters is random seek latency. And the faster the spindle, the lower the latency. Thus 15k Seagate SAS drives are excellent candidates for mail store duty, as are any 10k or 15k drives.

-- Stan

Wojciech Puchar

12:36 a.m.

...

is random seek latency. And the faster the spindle, the lower the latency. Thus 15k Seagate SAS drives are excellent candidates for mail store duty, as are any 10k or 15k drives. definitely not counting by $/IOPS rate. even worse looking with $/GB which is more important unless you make <1GB mail account.

Patrick Domack

2:24 a.m.

Quoting Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>:

...

...
is random seek latency. And the faster the spindle, the lower the latency. Thus 15k Seagate SAS drives are excellent candidates for mail store duty, as are any 10k or 15k drives. definitely not counting by $/IOPS rate. even worse looking with $/GB
which is more important unless you make <1GB mail account.

It largely depends what type of users your supporting.

One system, I have 300 users, and it have an load on the drives of

...

500 iops a lot, need 8disk raid10 to support the load, about 300gigs
worth of email.

Another system, has 10's of k of users, storage of about 2tb and
normal usage of about 7k iops.

Both of these systems have a 10gig limit per user account. I find most
of the time users don't max out the mailstore, unless they stop using
that account.

So in both of my cases, $/IOPS is much more important than $/GB, as
the GB just goes to waste.

Reindl Harald

8 Jul 8 Jul

12:55 p.m.

Am 08.07.2012 09:27, schrieb Steve Litt:

...

On Sat, 07 Jul 2012 11:36:02 +0200, Reindl Harald said:

...
to believe under really high load a local storage is faster at the end is bullshit!

Can one even argue on one side or the other without knowing the speed of the network, and how much contention is on that network?

the SAN has normally it's own network

...

My experience is that with a 100Mbs network, local is faster, although I've never had a SAN, so to speak, on the other end.

nobody is using 100 MBit for a SAN

...

The specification of a SATA rev 3 is 6Gbs, which is a heck of a lot faster than 1Gbs per second spec of a gigabit network. Both have a lot of things slowing them from their spec, but I'd need to see some proof of an assertion that anything coming in over a 1Gbs wire can beat a SATA rev3 local disk.

there is more than the connection speed

6Gbsdo not help you much as long the physical disk can not write in this speed and more concurretn writes making this worser - so there are many things like big battery backed caches fon a SAN which are imprtant for OVERALL performance

Wojciech Puchar

3:18 p.m.

...

there is more than the connection speed

6Gbsdo not help you much as long the physical disk can not write in this speed and more concurretn writes making this worser - so there are many things like big battery backed caches fon a SAN which are imprtant for OVERALL performance

with cache as big as disks itself it MAY even reach reasonable performance.

This discussion is funny to read. Keep them going please!

Charles Marcus

6:31 p.m.

On 2012-07-08 5:55 AM, Reindl Harald <h.reindl@thelounge.net> wrote:

...

nobody is using 100 MBit for a SAN

And no one who is using a SAN is using 100Mb on the LAN either. In fact, I'd say that even 99.9% of all LANs - even small (wired) home LANs are Gb...

Best regards,

Charles

Noel Butler

7 Jul 7 Jul

12:39 p.m.

On Sat, 2012-07-07 at 11:23 +0200, Wojciech Puchar wrote:

...

...
...
Fine. i understand that. What i am suggesting is not making large LUNs. you get the best performance with directly attaching disks to your machine.

That's simply not true. 99% of block latency is rotational. iSCSI It's not about iSCSI latency and overhead.

It's about other things i just don't want to explain anymore for people that just have to much money to spent to make things more complex and less efficient. It's just stupid, but it's OT so EOT.

You get used to Stan's rants, most ignore his drivel, been proved before he knows enough about hardware to be dangerous, and even less about anything bigger than SOHO mail networks.

J E Lyon

12:27 p.m.

On 7 Jul 2012, at 07:37, Stan Hoeppner wrote:

...

99% of block latency is rotational.

So true... I spend my entire life trying to convince customers to add heaps and heaps of RAM to *nix servers to make them faster and not be swayed by talk of faster CPUs . . Sheeesh! . . Come to think of it, I'd dump enough RAM in my servers to keep the entire disk array cached if I could, um, well *coughs*

~ James.

J E Lyon

5 Jul 5 Jul

12:46 p.m.

On 5 Jul 2012, at 08:44, Adrian M wrote:

...

Hi Stan, I know how to add drives to the storage and how to grow the existing filesystem, but such big filesystems are somehow new to mainstream linux. Yes, I know some university out there already have pentabytes filesystems, but right now stable linux systems have trouble formatting ext4 partition over 16T. All this is telling me that is safer to have two or tree smaller filesystems than a big one. Dovecot has a nice feature for this "Directory hashing" http://wiki.dovecot.org/MailLocation/

What I don't know is a nice way to migrate from a single directory no hashing to more than one and hashing.

When I first saw you mention hashing, I misread it as some sort of hash-table approach to large directories that I wasn't aware of, or something . . And now I've read the Dovecot documentation, I see what you're talking about!

Why are you bothered about using the hash, instead of just splitting on the first letter of the existing account name? Is it to more evenly randomise the distribution of accounts?

The advantage of the username "first letter" approach, or returning hardcoded locations from userdb (as per Timo's suggestion), is that you can more readily move the directories manually during a very short period of downtime, I'd have thought?

~ James.

Adrian M

1:14 p.m.

On Thu, Jul 5, 2012 at 12:46 PM, J E Lyon <role.Dovecot-Readers@jlassocs.com> wrote:

...

When I first saw you mention hashing, I misread it as some sort of hash-table approach to large directories that I wasn't aware of, or something . . And now I've read the Dovecot documentation, I see what you're talking about!

Why are you bothered about using the hash, instead of just splitting on the first letter of the existing account name? Is it to more evenly randomise the distribution of accounts?

The advantage of the username "first letter" approach, or returning hardcoded locations from userdb (as per Timo's suggestion), is that you can more readily move the directories manually during a very short period of downtime, I'd have thought?

~ James.

Thank you, this is a valid suggestion, especially since it could be done directly with some SQL magic in dovecot config file. I will consider this option !

Wojciech Puchar

2:35 p.m.

...

...
~ James.

Thank you, this is a valid suggestion, especially since it could be done directly with some SQL magic in dovecot config file. I will consider this option !

as i always run dovecot using standard unix auth/password mechanism and mail user is always unix user, then it is rather trivial. editing automatically /etc/master.passwd to change home directories depending of your needs and then moving data is simple and you may do any way.

Even without this i would recommend your way of spreading users by human-understandable way.

i would take 2 first letters, mak groups out of them to have roughly same mount of users or disk space and then move.

Wojciech Puchar

1:18 p.m.

...

It absolutely kills me every time I see a mail server admin display almost total lack of knowledge of his/her storage back end, or the inability to describe it technically, in an email...

You should get used to this. Welcome in XXI century!

The rule is

amount of real knowledge*official paper confirmed skills=const

4792

Age (days ago)

4796

Last active (days ago)

List overview

50 comments

15 participants

participants (15)

Adrian M
Adrian Minta
Charles Marcus
Ed W
J E Lyon
J E Lyon
Kaya Saman
Noel Butler
Patrick Domack
Reindl Harald
Simon Brereton
Stan Hoeppner
Steve Litt
Timo Sirainen
Wojciech Puchar