Re: [Dovecot] Please advise on very fast search

newer
[Dovecot] Port variable in LMTP...

Alexander Chekalin

9 Nov 2011 9 Nov '11

6:16 p.m.

Thanks, Robert,

will take a look at.

What I'm afraid for is how database storage should be planned (storage, CPU, RAM, scaling when will be over-filled). When dealing with files (I'm using maildir), it is much easy to understand and to fix just about everything. Adding database involves tune it up too, and I'll have more points of 'tune it a bit'

In fact work with Dovecot is pretty nice, but I think I can tune it to work faster.

I now run it on FreeBSD (on UFS2), maybe I should change OS + FS, but need to test (really hope ZFS disks on SAS drives will help; still find no benchmarks on such a setup). Will also try to use full text search, but afraid of index size (and I need no search on body, just on headers).

Anyway thank your for pointing me in right directions!

Yours, Alexander

Show replies by date

Timo Sirainen

9 Nov 9 Nov

6:40 p.m.

New subject: [Dovecot] Please advise on very fast search

On Wed, 2011-11-09 at 19:16 +0300, Alexander Chekalin wrote:

...

Will also try to use full text search, but afraid of index size (and I need no search on body, just on headers).

It wouldn't be difficult to patch Dovecot to skip indexing message bodies. Of course then you'd need to remember to keep applying the patch when updating.

Stan Hoeppner

10 Nov 10 Nov

2:50 a.m.

New subject: [Dovecot] Please advise on very fast search

On 11/9/2011 10:40 AM, Timo Sirainen wrote:

...

On Wed, 2011-11-09 at 19:16 +0300, Alexander Chekalin wrote:

...
Will also try to use full text search, but afraid of index size (and I need no search on body, just on headers).

It wouldn't be difficult to patch Dovecot to skip indexing message bodies. Of course then you'd need to remember to keep applying the patch when updating.

Also keep in mind that, in general, many/most message headers today are often as large, or larger than, the actual message body, especially for list mail. Just take a look at messages from this for evidence.

Thus, I'd think that going out of your way to avoid indexing message bodies wouldn't be worth the effort/headaches involved.

-- Stan

Stan Hoeppner

2:42 a.m.

New subject: [Dovecot] Please advise on very fast search

On 11/9/2011 10:16 AM, Alexander Chekalin wrote:

...

Thanks, Robert,

will take a look at.

What I'm afraid for is how database storage should be planned (storage, CPU, RAM, scaling when will be over-filled). When dealing with files (I'm using maildir)

Bingo. ^^^

Maildir is very likely a hug factor in your current slow search time. With a maildir search, every mail file must be opened and searched. How many total mail files are opened for each of your searches? Thousands? Tens of thousands? Maildir causes a massive disk IO bottleneck when searching so many files. Run iostat the next time you do one of these searches, and look at the %iowait value. It will likely be very high. If it is, this confirms maildir is a big part of the problem.

mbox, and mdbox, would be many many times faster than maildir WRT searching as the total number of files is lower by orders of magnitude. Switching from maildir to mbox/mdbox shifts the workload burden from the disk subsystem to the processor/memory. And I'm sure as with everyone else on the planet today, you have massive spare CPU cycles, but extremely limited spindle throughput.

And as Timo suggested, using one of the indexing search plugins would be much faster yet, as long as you keep the indexes updated.

-- Stan

Alexander Chekalin

6:37 a.m.

New subject: [Dovecot] Please advise on very fast search

Oh, that's the point to consider.

But I must confess I'm in love with Maildir for maybe 10 years for that simple fact I can do anything with each and every single message even on disk (=much faster than via IMAP). If I would deal with mbox directly I'd need to parse huge files, brrrr.

Are there any ways I can search or parse mboxes or mdboxes not directly and not with IMAP (I'm afraid it slooow in dump parsing)?

10.11.2011, в 3:42, Stan Hoeppner <stan@hardwarefreak.com> написал(а):

...

On 11/9/2011 10:16 AM, Alexander Chekalin wrote:

...
Thanks, Robert,

will take a look at.

What I'm afraid for is how database storage should be planned (storage, CPU, RAM, scaling when will be over-filled). When dealing with files (I'm using maildir)

Bingo. ^^^

Maildir is very likely a hug factor in your current slow search time. With a maildir search, every mail file must be opened and searched. How many total mail files are opened for each of your searches? Thousands? Tens of thousands? Maildir causes a massive disk IO bottleneck when searching so many files. Run iostat the next time you do one of these searches, and look at the %iowait value. It will likely be very high. If it is, this confirms maildir is a big part of the problem.

mbox, and mdbox, would be many many times faster than maildir WRT searching as the total number of files is lower by orders of magnitude. Switching from maildir to mbox/mdbox shifts the workload burden from the disk subsystem to the processor/memory. And I'm sure as with everyone else on the planet today, you have massive spare CPU cycles, but extremely limited spindle throughput.

And as Timo suggested, using one of the indexing search plugins would be much faster yet, as long as you keep the indexes updated.

-- Stan

Stan Hoeppner

2:46 p.m.

New subject: [Dovecot] Please advise on very fast search

On 11/9/2011 10:37 PM, Alexander Chekalin wrote:

...

Oh, that's the point to consider.

But I must confess I'm in love with Maildir for maybe 10 years

This love affair may be coming to and end.

...

...for that simple fact I can do anything with each and every single message even on disk (=much faster than via IMAP). If I would deal with mbox directly I'd need to parse huge files, brrrr.

Mbox is an excellent mailbox format for archived mail *because of* the fact that searching it is very fast and the disk subsystem overhead is low. For example, on my decade+ old 550MHz x86 SOHO server with only 384MB RAM and a single 7.2k SATA disk, after dropping caches, we'll search my debian-users mbox archive (my largest) for total message count by searching a known header of every message:

-rw------- 1 stan stan 133M Nov 10 06:03 1-Debian-Users

~/mail$ time grep -c Content-Length 1-Debian-Users 22817

real 0m1.731s user 0m0.328s sys 0m0.852s

Now let's search for posts from me (after dropping caches again):

~/mail$ time grep -c "From: Stan Hoeppner" 1-Debian-Users 536

real 0m1.657s user 0m0.216s sys 0m0.896s

Nested greps will obviously take longer, as will those using perl expressions, but this gives some indication of the kind of speed we're talking about: less than seconds to search 22,000+ messages for a specific single header. So that's ~20 seconds for an mbox containing 220K+ messages, again on 10+ year old hardware.

...

Are there any ways I can search or parse mboxes or mdboxes not directly and not with IMAP (I'm afraid it slooow in dump parsing)?

You should probably take a look at Enkive. I'm not sure what mail storage format it uses, and I've not used it personally, so I can't vouch for its speed, but it's pretty complete feature-wise. Take the test drive--nice search interface.

http://www.enkive.org/

-- Stan

Timo Sirainen

5:29 p.m.

New subject: [Dovecot] Please advise on very fast search

On 10.11.2011, at 6.37, Alexander Chekalin wrote:

...

Are there any ways I can search or parse mboxes or mdboxes not directly and not with IMAP (I'm afraid it slooow in dump parsing)?

See doveadm fetch / doveadm search.

...

in fact the only thing I miss even with my current scheme is permanent ID assigned to the message so I can easily find it despite the IMAP mailbox it is now (so if someone moved the message from one mailbox/folder to another, the ID allows to retrieve it fast anyway).

Dovecot has message GUIDs (with maildir it's filename), but there's no quick lookup for them, even though doveadm can fetch them easily:

doveadm fetch text guid 12312312

Alexander Chekalin

14 Nov 14 Nov

4:35 p.m.

New subject: [Dovecot] Please advise on very fast search

Timo, Stan,

I've just tested mdbox and find it pretty nice for me, but now I got some questions for you:

mdbox uses 'a lot' files (m.1, m.2 ... etc), and the default size if 2Mb. Looks like not even every message can fit into such storage container volume (nowadays we used to see messages of 20Mb and even more). Should I tune it (at least mdbox_rotate_size and mdbox_rotate_interval) or its size is on purpose? As for now I store each day's messages in separate IMAP folders (mailboxes), which gives me 2000-6000 messages and 2-5 Gb (on disk) per folder.
I can use no compression, gz and bz2 - which one will be better for storing archive messages? I've just tested mdbox by copying 5800+ msgs from maildir to compressed mdbox, and it took exactly the same size (2.8 G) in 100+ small m.* files. No good as far.
What if I use maildir as I do now but turn on compression, will this speed things up?

I'd like to use mdbox as storage but for now it is very new for me and I simple afraid what should I do if I'll need to manually fix the storage (maildir is really good for that, surely).

After all, I simple need to speed up the search and restore process in archive.

Yours, Alexander

Stan Hoeppner

8:23 p.m.

New subject: [Dovecot] Please advise on very fast search

On 11/14/2011 8:35 AM, Alexander Chekalin wrote:

...

Timo, Stan,

I've just tested mdbox and find it pretty nice for me, but now I got some questions for you:

mdbox uses 'a lot' files (m.1, m.2 ... etc), and the default size if 2Mb. Looks like not even every message can fit into such storage container volume (nowadays we used to see messages of 20Mb and even more). Should I tune it (at least mdbox_rotate_size and mdbox_rotate_interval) or its size is on purpose? As for now I store each day's messages in separate IMAP folders (mailboxes), which gives me 2000-6000 messages and 2-5 Gb (on disk) per folder.

mdbox_rotate_size of 2MB is too small for your needs. Test 32MB and 64MB.

...

I can use no compression, gz and bz2 - which one will be better for storing archive messages? I've just tested mdbox by copying 5800+ msgs from maildir to compressed mdbox, and it took exactly the same size (2.8 G) in 100+ small m.* files. No good as far.

bzip2 may give you a little better compression but at the cost of much lower de/compression speed and higher CPU and memory consumption. gzip will be faster all around, between 4x-8x, with lower mem usage, but with less compression resulting in slightly larger file sizes than bzip2.

...

What if I use maildir as I do now but turn on compression, will this speed things up?

No. Maildir performance is limited by the disk head actuator speed, which is between 150-300 seeks per second depending on your disk (7.2k vs 15k RPM). Compressing the files doesn't change the seek physics of the disk drives. You're still reading tens of thousands of files when doing your searches thus bouncing the heads tens of thousands of times.

mbox uses a single file, so head speed isn't a factor, as it may only move a few times when reading an entire mailbox file. Thus, bandwidth becomes the potential bottleneck. Using compression with large mbox files can substantially increase search performance as effective bandwidth is increased by ~4x using gzip and 6x using bzip2. This assumes you have plenty of excess CPU power. mdbox should see similar compression speedups if you use file sizes much larger than the 2MB default. Doing so should keep your IOPS well below the drive's head saturation point as you're reading only a fraction of the file count compared to maildir.

...

I'd like to use mdbox as storage but for now it is very new for me and I simple afraid what should I do if I'll need to manually fix the storage (maildir is really good for that, surely).

Doveadm handles such tasks pretty well. Just make sure you keep good backups of your mdbox files.

...

After all, I simple need to speed up the search and restore process in archive.

The only way to accomplish this with maildir is with much bigger, faster, more expensive storage hardware. And the gain will still be much less than simply switching to a larger file format such as mbox or mdbox.

As with many things some computer technologies come full circle over time. One of the reasons the creators of the UNIX mbox mail file format decided upon a single file many decades ago was the horribly limited seek performance of the slow SCSI disks of that period. Doing something like the maildir format was simply impossible at that time. In the early days of the public internet, disk became faster than the average load and maildir was born to fix the locking and corruption shortcomings of mbox.

Today many sites are hitting the seek problem of a few decades ago because boxes are oversubscribed with users, emails now frequently contain attachments, everyone is storing more email, and the total volume of email is a few orders of magnitude greater.

IIRC, this is one of the reasons Timo created mdbox--to decrease the massive IOPS load, and thus slow performance, of large maildir stores.

-- Stan

Alexander Chekalin

11:16 p.m.

New subject: [Dovecot] Please advise on very fast search

Locking issues on mbox is the reason for my long-lasting love affair with maildir, and it's lasts long years. Ok, the life's lessons are like this, learn something and move on with it ;) even if it's "new old thing". Thank you for pointing that!

What I was doubt about is default rotate size of 2M, since I used to see pretty reasonable default settings in all Dovecot config. 32 or 64 are much close to the ones I'd personally prefer.

I also about to choose now is the OS and FS for the archive. I seriously think about ZFS with compression (in fact it will be stripes over couple of mirrors = software equivalent of RAID 10 on SATA drives, with compression on FS level) on FreeBSD, or XFS over LVM on Debian with compression in mdbox itself. I see pros and contras for both, so that's the question to answer!

Yours, Alexander

...

On 11/14/2011 8:35 AM, Alexander Chekalin wrote:

...
Timo, Stan,

I've just tested mdbox and find it pretty nice for me, but now I got some questions for you:

mdbox uses 'a lot' files (m.1, m.2 ... etc), and the default size if 2Mb. Looks like not even every message can fit into such storage container volume (nowadays we used to see messages of 20Mb and even more). Should I tune it (at least mdbox_rotate_size and mdbox_rotate_interval) or its size is on purpose? As for now I store each day's messages in separate IMAP folders (mailboxes), which gives me 2000-6000 messages and 2-5 Gb (on disk) per folder.

mdbox_rotate_size of 2MB is too small for your needs. Test 32MB and 64MB.

...

I can use no compression, gz and bz2 - which one will be better for storing archive messages? I've just tested mdbox by copying 5800+ msgs from maildir to compressed mdbox, and it took exactly the same size (2.8 G) in 100+ small m.* files. No good as far.

bzip2 may give you a little better compression but at the cost of much lower de/compression speed and higher CPU and memory consumption. gzip will be faster all around, between 4x-8x, with lower mem usage, but with less compression resulting in slightly larger file sizes than bzip2.

...

What if I use maildir as I do now but turn on compression, will this speed things up?

No. Maildir performance is limited by the disk head actuator speed, which is between 150-300 seeks per second depending on your disk (7.2k vs 15k RPM). Compressing the files doesn't change the seek physics of the disk drives. You're still reading tens of thousands of files when doing your searches thus bouncing the heads tens of thousands of times.

mbox uses a single file, so head speed isn't a factor, as it may only move a few times when reading an entire mailbox file. Thus, bandwidth becomes the potential bottleneck. Using compression with large mbox files can substantially increase search performance as effective bandwidth is increased by ~4x using gzip and 6x using bzip2. This assumes you have plenty of excess CPU power. mdbox should see similar compression speedups if you use file sizes much larger than the 2MB default. Doing so should keep your IOPS well below the drive's head saturation point as you're reading only a fraction of the file count compared to maildir.

...
I'd like to use mdbox as storage but for now it is very new for me and I simple afraid what should I do if I'll need to manually fix the storage (maildir is really good for that, surely).

Doveadm handles such tasks pretty well. Just make sure you keep good backups of your mdbox files.

...
After all, I simple need to speed up the search and restore process in archive.

The only way to accomplish this with maildir is with much bigger, faster, more expensive storage hardware. And the gain will still be much less than simply switching to a larger file format such as mbox or mdbox.

As with many things some computer technologies come full circle over time. One of the reasons the creators of the UNIX mbox mail file format decided upon a single file many decades ago was the horribly limited seek performance of the slow SCSI disks of that period. Doing something like the maildir format was simply impossible at that time. In the early days of the public internet, disk became faster than the average load and maildir was born to fix the locking and corruption shortcomings of mbox.

Today many sites are hitting the seek problem of a few decades ago because boxes are oversubscribed with users, emails now frequently contain attachments, everyone is storing more email, and the total volume of email is a few orders of magnitude greater.

IIRC, this is one of the reasons Timo created mdbox--to decrease the massive IOPS load, and thus slow performance, of large maildir stores.

-- Stan

Stan Hoeppner

15 Nov 15 Nov

8:26 p.m.

New subject: [Dovecot] Please advise on very fast search

On 11/14/2011 3:16 PM, Alexander Chekalin wrote:

...

Locking issues on mbox is the reason for my long-lasting love affair with maildir,

Same reason most others fell in love with it. Many now want to divorce maildir, as the cost of the storage to maintain acceptable performance is now too high.

...

and it's lasts long years. Ok, the life's lessons are like this, learn something and move on with it ;) even if it's "new old thing". Thank you for pointing that!

Many old UNIX gurus still use mbox, not maildir, and never will. If you ask them why they'll likely say "you don't use a screwdriver to drive a nail do you?"

...

What I was doubt about is default rotate size of 2M, since I used to see pretty reasonable default settings in all Dovecot config. 32 or 64 are much close to the ones I'd personally prefer.

Given the fact that we're talking about an archive server, you'd be better off using a very large mdbox file size, say 1GB. You're never deleting individual messages from this archive correct? No expunges?

This is why I recommended mbox in the first place. If your only writes to these mailbox files are appends of new messages, mbox is the best format by far. It's faster at appending than any other format, and it's faster for searching than any other.

...

I also about to choose now is the OS and FS for the archive. I seriously think about ZFS with compression (in fact it will be stripes over couple of mirrors = software equivalent of RAID 10 on SATA drives, with compression on FS level) on FreeBSD, or XFS over LVM on Debian with compression in mdbox itself. I see pros and contras for both, so that's the question to answer!

It's an archive. You're not going to use maildir so you don't need random IOPS performance. Thus RAID5/6 are a much better fit for an archive as you get better read performance, with more than adequate write performance, and you use less disks. And as this is an archive, you don't need real time automatic/transparent compression. Thus I recommend something like:

Debian 6 w/linux-image-2.6.39-bpo.2-amd64 or a custom rolled 2.6.39 or later kernel
hardware RAID5 w/large (2TB) SATA disks, 512B native sectors e.g. MegaRAID SAS 9261-8i, 4 Seagate Constellation ES ST2000NM0011 Specify a strip size of 256KB for the array Perma set /sys/block/sdX/read_ahead_kb to 512 so you're reading ahead 1024 sectors at a time instead of the default of 256. This will speed up your searches quite a bit.
XFS filesystem on the RAID device, created with mkfs.xfs defaults
mbox w/zlib plugin. Compress daily files each night with a script
You don't need LVM with a good RAID card (or with mdraid). This controller can expand the RAID5 up to 8 drives (up to 32 drives max using SAS expanders)

-- Stan

Timo Sirainen

9:02 p.m.

New subject: [Dovecot] Please advise on very fast search

On Tue, 2011-11-15 at 12:26 -0600, Stan Hoeppner wrote:

...

This is why I recommended mbox in the first place. If your only writes to these mailbox files are appends of new messages, mbox is the best format by far. It's faster at appending than any other format, and it's faster for searching than any other.

Just as long as you're not simultaneously trying to read and write the mbox file (or just write in 2+ sessions). Then there's a lot waiting on locks. (mdbox has no read locks, and its write locks are very short lived.)

Stan Hoeppner

17 Nov 17 Nov

12:27 p.m.

New subject: [Dovecot] Please advise on very fast search

On 11/15/2011 1:02 PM, Timo Sirainen wrote:

...

On Tue, 2011-11-15 at 12:26 -0600, Stan Hoeppner wrote:

...
This is why I recommended mbox in the first place. If your only writes to these mailbox files are appends of new messages, mbox is the best format by far. It's faster at appending than any other format, and it's faster for searching than any other.

Just as long as you're not simultaneously trying to read and write the mbox file (or just write in 2+ sessions). Then there's a lot waiting on locks. (mdbox has no read locks, and its write locks are very short lived.)

Of course. My understanding of Alexander's workflow is that copies of all daily new mail are written to an IMAP mailbox via some MTA bcc rule or sieve rule. A nightly script moves the daily mail to another mailbox created and named by date. These named mailboxes are then used for backup and the search function, but are never written to again. So I assume there is no simultaneous read/write of the archive mailboxes he performs searches on. It's possible I don't fully understand Alexander's work flow yet.

-- Stan

Alexander Chekalin

16 Nov 16 Nov

8:16 a.m.

New subject: [Dovecot] Please advise on very fast search

Hello, Stan,

...

This is why I recommended mbox in the first place. If your only writes to these mailbox files are appends of new messages, mbox is the best format by far. It's faster at appending than any other format, and it's faster for searching than any other.

I now seriously consider to use mdbox due to its nice self-regulation. After all it I believe mdbox should do file compression on its own, no cron scripts required.

...

It's an archive. You're not going to use maildir so you don't need random IOPS performance. Thus RAID5/6 are a much better fit for an archive as you get better read performance, with more than adequate write performance, and you use less disks. And as this is an archive, you don't need real time automatic/transparent compression. Thus I recommend something like:

Debian 6 w/linux-image-2.6.39-bpo.2-amd64 or a custom rolled 2.6.39 or later kernel

hardware RAID5 w/large (2TB) SATA disks, 512B native sectors e.g. MegaRAID SAS 9261-8i, 4 Seagate Constellation ES ST2000NM0011 Specify a strip size of 256KB for the array Perma set /sys/block/sdX/read_ahead_kb to 512 so you're reading ahead 1024 sectors at a time instead of the default of 256. This will speed up your searches quite a bit.

XFS filesystem on the RAID device, created with mkfs.xfs defaults

mbox w/zlib plugin. Compress daily files each night with a script

You don't need LVM with a good RAID card (or with mdraid). This controller can expand the RAID5 up to 8 drives (up to 32 drives max using SAS expanders)

We are considering to get HP DL180G6 server for 8 or 14 drives bays (base model price is somewhat equal, but additional drives adds up cost) with HP Smart Array P410 RAID controller (some servers are equipped with this controller by default) with 256 Mb battery-backed cache, but I'll check your suggestions!

What memory size should I plan in the server? You're talking about AMD64 OS image, and 64-bit software are like to consume more memory that 32-bit, so looks like your talking about pretty huge RAM, and I don't believe it's necessary, or maybe I'm wrong?

Problem is I have no experience with XFS and not sure I can tune it in the best way, so I'll go with mkfs.xfs defaults, I think.

Hope we'll see Dovecot 2.1.x stable soon, as I'd like to use fts plugins and 2.1 handle that much better, but I don't like the idea of use unstable in production.

Thank you for taking your time on my case, yours, Alexander

Timo Sirainen

15 Nov 15 Nov

2:19 a.m.

New subject: [Dovecot] Please advise on very fast search

On 14.11.2011, at 16.35, Alexander Chekalin wrote:

...

mdbox uses 'a lot' files (m.1, m.2 ... etc), and the default size if 2Mb. Looks like not even every message can fit into such storage container volume (nowadays we used to see messages of 20Mb and even more).

The messages are never split into multiple files. So if you have a 20 MB message, it gets stored into its own m.* file.

...

Should I tune it (at least mdbox_rotate_size and mdbox_rotate_interval) or its size is on purpose? As for now I store each day's messages in separate IMAP folders (mailboxes), which gives me 2000-6000 messages and 2-5 Gb (on disk) per folder.

The main problem with larger mdbox files is that if you expunge messages, there's more data to write when packing the data into a new file. I don't really know the "best" value for mdbox_rotate_size setting. But even a 2 MB mdbox file can contain thousands of small mails, so it's not too bad..

Alexander Chekalin

10 Nov 10 Nov

7:35 a.m.

New subject: [Dovecot] Please advise on very fast search

Hello, Stan,

in fact the only thing I miss even with my current scheme is permanent ID assigned to the message so I can easily find it despite the IMAP mailbox it is now (so if someone moved the message from one mailbox/folder to another, the ID allows to retrieve it fast anyway).

You see, what I need is not only find message from|to someone on specified date, I also sometime need to restore that message back to user's original box. As far our mailserver and backup-mailserver are different machines, it is a bit tricky to copy messages between it fast enough. Say, if I need to find and restore all mails from user@domain.com within 2009 year, and search yields in some 1000's of messages, then use IMAP to copy it over to another server takes some time - and if you consider both search time and restore/copy time the whole process may take "ages".

With maildir I can rsync/scp needed files to another host and that's fast way - that's why I stick with maildir.

FTS in my case can help (I can search for user@domain.com, for example), but it also return messages that contains such a string in message body (and that takes index space, too), so I'll need to filter it later, but surely it'll be faster than checking every message in the archive.

Yours, Alexander

...

Maildir is very likely a hug factor in your current slow search time. With a maildir search, every mail file must be opened and searched. How many total mail files are opened for each of your searches? Thousands? Tens of thousands? Maildir causes a massive disk IO bottleneck when searching so many files. Run iostat the next time you do one of these searches, and look at the %iowait value. It will likely be very high. If it is, this confirms maildir is a big part of the problem.

mbox, and mdbox, would be many many times faster than maildir WRT searching as the total number of files is lower by orders of magnitude. Switching from maildir to mbox/mdbox shifts the workload burden from the disk subsystem to the processor/memory. And I'm sure as with everyone else on the planet today, you have massive spare CPU cycles, but extremely limited spindle throughput.

And as Timo suggested, using one of the indexing search plugins would be much faster yet, as long as you keep the indexes updated.

С уважением, Александр Чекалин Лазурит Калининград +7 909 799 2549 achekalin@lazurit.com

Stan Hoeppner

3:17 p.m.

New subject: [Dovecot] Please advise on very fast search

On 11/9/2011 11:35 PM, Alexander Chekalin wrote:

...

Hello, Stan,

in fact the only thing I miss even with my current scheme is permanent ID assigned to the message so I can easily find it despite the IMAP mailbox it is now (so if someone moved the message from one mailbox/folder to another, the ID allows to retrieve it fast anyway).

You see, what I need is not only find message from|to someone on specified date, I also sometime need to restore that message back to user's original box. As far our mailserver and backup-mailserver are different machines, it is a bit tricky to copy messages between it fast enough. Say, if I need to find and restore all mails from user@domain.com within 2009 year, and search yields in some 1000's of messages, then use IMAP to copy it over to another server takes some time - and if you consider both search time and restore/copy time the whole process may take "ages".

Apparently I didn't fully understand all of your requirements.

Moving the archived mail to mbox/mdbox and/or getting a good indexing search engine installed will cut the search time down tremendously. Whether that would make up for the time consumed with an IMAP copy of many emails I don't know. If your servers aren't old and slow, and are not already overloaded, I would think the IMAP message copying over GbE would be pretty quick, even for the 1000 messages scenario.

There may be some Dovecot tweaks that might make this copy process faster. Timo would need to chime in on that. Do you perform the IMAP transfers with a GUI IMAP client on your management PC? Or are you using imapsync or some other util directly on the servers?

If the former you may be able to tweak your IMAP client to speed up the transfers as well. Try using IMAP and not IMAPS for the transfers. What is the network infrastructure between the servers and your management workstation? Is it all GbE with jumbo frames enabled?

...

With maildir I can rsync/scp needed files to another host and that's fast way - that's why I stick with maildir.

There is definitely some flexibility here.

...

FTS in my case can help (I can search for user@domain.com, for example), but it also return messages that contains such a string in message body (and that takes index space, too), so I'll need to filter it later, but surely it'll be faster than checking every message in the archive.

Sure. So you're concerned with your poor performance, but also with disk space. Unfortunately there's no free lunch to be had. You'll have to make sacrifices somewhere. You could go with mdbox and use compression, trading that saved space for search index files space.

-- Stan

5027

Age (days ago)

5035

Last active (days ago)

List overview

16 comments

3 participants

participants (3)

Alexander Chekalin
Stan Hoeppner
Timo Sirainen