[Dovecot] Please advise on very fast search
Hello,
I try to create some kind of mail backup system. What I need is system that will store mail for the whole domain, and allow me to restore messages from/to specified email at that domain.
The scheme is pretty simple: on our main mail server the SMTP server itself has a rule to send a copy of every message to 'backup@backupserver.host', and the backupserver.host domain is placed nearby on second server.
The SMTP on second server do simple 'catchall' redirect of all messages to the single box. There is also a Dovecot that takes care for remote IMAP access to that box. And, finally, I've create some scripts to sort all messages in INBOX to folders named after message's date.
So I have a lot of mailboxes inside the catchall box: INBOX 2011.11.03 2011.11.04 2011.11.05 2011.11.06 ...etc...
and each folder holds messages for that day. Simply, and works perfectly.
The problem is that when my archive become big (several years), it appears to be painful to find specified message(s). When someone suddenly needs to find his/her old message, it is mostly guesses like 'I think the message was between june and july of 2009, or maybe month or two before that', so I need to search all mailboxes (with 1000's messages in each). And it takes really long time.
I tried to play with Dovecot indexes, but it won't help too much. The bad part is that I need to search for all emails in each message headers, not only for "From" or "To", since some messages are sent to maillists soe "To" = list address, not person's personal email.
Then I tried to index messages on my own, storing info on emails into MySQL database ('email' -> 'mailbox', 'message filename'), but soon I find out that message files can be renamed by Dovecot.
Could you please advice me how to speed up message search?
Sorry for such a long question, hope you can help!
Yours, Alexander Chekalin
Am 09.11.2011 14:57, schrieb Alexander Chekalin:
Hello,
I try to create some kind of mail backup system. What I need is system that will store mail for the whole domain, and allow me to restore messages from/to specified email at that domain.
The scheme is pretty simple: on our main mail server the SMTP server itself has a rule to send a copy of every message to 'backup@backupserver.host', and the backupserver.host domain is placed nearby on second server.
The SMTP on second server do simple 'catchall' redirect of all messages to the single box. There is also a Dovecot that takes care for remote IMAP access to that box. And, finally, I've create some scripts to sort all messages in INBOX to folders named after message's date.
So I have a lot of mailboxes inside the catchall box: INBOX 2011.11.03 2011.11.04 2011.11.05 2011.11.06 ...etc...
and each folder holds messages for that day. Simply, and works perfectly.
The problem is that when my archive become big (several years), it appears to be painful to find specified message(s). When someone suddenly needs to find his/her old message, it is mostly guesses like 'I think the message was between june and july of 2009, or maybe month or two before that', so I need to search all mailboxes (with 1000's messages in each). And it takes really long time.
I tried to play with Dovecot indexes, but it won't help too much. The bad part is that I need to search for all emails in each message headers, not only for "From" or "To", since some messages are sent to maillists soe "To" = list address, not person's personal email.
Then I tried to index messages on my own, storing info on emails into MySQL database ('email' -> 'mailbox', 'message filename'), but soon I find out that message files can be renamed by Dovecot.
Could you please advice me how to speed up message search?
Sorry for such a long question, hope you can help!
Yours, Alexander Chekalin
guess youre searching over imap ? perhaps compression will help for speed up, and many other speed related stuff, or you need some other idea of indexing at last if its maildir how fast is "grep" etc...and so on some ideas here http://wiki.dovecot.org/HowTo/ReadOnlyArchive etc
anyway , i think you really need another kind of archive solution in Germany there is a law that you need to archive some kind of business mails up to 10 years for finance and other review, so there are a lot of "you can by" solutions now, these have solved the problems you discovered ( indexing etc ) i was shown i.e http://www.bytstormail.de which looked fine to me
or perhaps you might have a look http://www.archiveopteryx.org/ here too
-- Best Regards
MfG Robert Schetterer
Germany/Munich/Bavaria
On Wed, 2011-11-09 at 16:57 +0300, Alexander Chekalin wrote:
The problem is that when my archive become big (several years), it appears to be painful to find specified message(s). When someone suddenly needs to find his/her old message, it is mostly guesses like 'I think the message was between june and july of 2009, or maybe month or two before that', so I need to search all mailboxes (with 1000's messages in each). And it takes really long time.
I tried to play with Dovecot indexes, but it won't help too much.
They'll help with the dates.
The bad part is that I need to search for all emails in each message headers, not only for "From" or "To", since some messages are sent to maillists soe "To" = list address, not person's personal email.
Headers only, not message body? Anyway, some of the full text search backends would support searching from both. I'd recommend using either Solr or with Dovecot v2.1 you can also use Lucene: http://wiki2.dovecot.org/Plugins/FTS
participants (3)
-
Alexander Chekalin
-
Robert Schetterer
-
Timo Sirainen