[Dovecot] Please advise on very fast search

Alexander Chekalin achekalin at lazurit.com
Thu Nov 10 07:35:11 EET 2011


Hello, Stan,

in fact the only thing I miss even with my current scheme is permanent 
ID assigned to the message so I can easily find it despite the IMAP 
mailbox it is now (so if someone moved the message from one 
mailbox/folder to another, the ID allows to retrieve it fast anyway).

You see, what I need is not only find message from|to someone on 
specified date, I also sometime need to restore that message back to 
user's original box. As far our mailserver and backup-mailserver are 
different machines, it is a bit tricky to copy messages between it fast 
enough. Say, if I need to find and restore all mails from 
user at domain.com within 2009 year, and search yields in some 1000's of 
messages, then use IMAP to copy it over to another server takes some 
time - and if you consider both search time and restore/copy time the 
whole process may take "ages".

With maildir I can rsync/scp needed files to another host and that's 
fast way - that's why I stick with maildir.

FTS in my case can help (I can search for user at domain.com, for example), 
but it also return messages that contains such a string in message body 
(and that takes index space, too), so I'll need to filter it later, but 
surely it'll be faster than checking every message in the archive.

Yours,
   Alexander

> Maildir is very likely a hug factor in your current slow search time.
> With a maildir search, every mail file must be opened and searched.  How
> many total mail files are opened for each of your searches?  Thousands?
>   Tens of thousands?  Maildir causes a massive disk IO bottleneck when
> searching so many files.  Run iostat the next time you do one of these
> searches, and look at the %iowait value.  It will likely be very high.
> If it is, this confirms maildir is a big part of the problem.
>
> mbox, and mdbox, would be many many times faster than maildir WRT
> searching as the total number of files is lower by orders of magnitude.
>   Switching from maildir to mbox/mdbox shifts the workload burden from
> the disk subsystem to the processor/memory.  And I'm sure as with
> everyone else on the planet today, you have massive spare CPU cycles,
> but extremely limited spindle throughput.
>
> And as Timo suggested, using one of the indexing search plugins would be
> much faster yet, as long as you keep the indexes updated.
>


-- 

С уважением,
   Александр Чекалин
   Лазурит
   Калининград
   +7 909 799 2549
   achekalin at lazurit.com



More information about the dovecot mailing list