[Dovecot] SQL mail storage
I wasted some time yesterday and today implementing a SQL storage plugin. It seems to be working, but:
- Saving new messages is done in a regular INSERT statement, which is bad. PostgreSQL has at least this COPY TO command which could be used instead.
- It breaks in stress testing
- It's not that well optimized. Especially it could support caching some commonly requested fields (same as dovecot.index.cache file).
- Currently works only with PostgreSQL.
Nothing is committed to CVS yet. I'm not sure if I should even do that. I'm not sure if those lib-sql API changes were that great. Maybe there's a better way..
If you're interested in trying it, you need the latest CVS HEAD sources and these patches:
http://dovecot.org/patches/sql-storage-changes.diff http://dovecot.org/patches/sql-storage-plugin.diff
The SQL schema is in src/plugins/mail-sql/schema.sql file. It creates a "tss" named user by default, so you might want to modify that. :)
Then set:
mail_location = sql:pgsql:host=localhost dbname=mails user=mailuser password=foo mail_plugins = mail_sql
Timo Sirainen wrote:
I wasted some time yesterday and today implementing a SQL storage plugin. It seems to be working, but:
- Saving new messages is done in a regular INSERT statement, which is bad. PostgreSQL has at least this COPY TO command which could be used instead.
- It breaks in stress testing
- It's not that well optimized. Especially it could support caching some commonly requested fields (same as dovecot.index.cache file).
- Currently works only with PostgreSQL.
Nothing is committed to CVS yet. I'm not sure if I should even do that. I'm not sure if those lib-sql API changes were that great. Maybe there's a better way..
If you're interested in trying it, you need the latest CVS HEAD sources and these patches:
http://dovecot.org/patches/sql-storage-changes.diff http://dovecot.org/patches/sql-storage-plugin.diff
The SQL schema is in src/plugins/mail-sql/schema.sql file. It creates a "tss" named user by default, so you might want to modify that. :)
Then set:
mail_location = sql:pgsql:host=localhost dbname=mails user=mailuser password=foo mail_plugins = mail_sql
I've been waiting till 1.0 came out to ask for new features like SQL. I don't think that SQL is a waste of time at all. In fact I believe that SQL is the future of email storage. SQL has a lot of advantages that will be tapped once people start using it. Yes - it is probably slower than mbox or maildir but you can make up for that with faster computers. SQL offers.
Replication - the ability to have instant redundancy on multiple servers.
The ability to undeliver messages - if the indexes are configured right, if you get a spam attack from an IP and after the attack you realize it was spam you can run a quick query and delete delivered email.
Statistical advantages - admins can do email queries across all domains and all usersand all folders to gather useful information. An example might be to generate white list information that can be used for trusted email sources to bypass spam filtering and reduce false positives.
Search Speed - should be a lot faster to search messages than maildir.
And - there's probably a lot of things that will be realized once it's implemented because the indexing ability opens new posibilities.
So - I definitely am someone who supports this idea. It will also make Dovecot stand out as a superior IMAP server.
Go for it!
On Sat, 2007-03-31 at 08:46 -0700, Marc Perkel wrote: <snip>
I've been waiting till 1.0 came out to ask for new features like SQL. I don't think that SQL is a waste of time at all. In fact I believe that SQL is the future of email storage. SQL has a lot of advantages that will be tapped once people start using it. Yes - it is probably slower than mbox or maildir but you can make up for that with faster computers. SQL offers.
Replication - the ability to have instant redundancy on multiple servers.
The ability to undeliver messages - if the indexes are configured right, if you get a spam attack from an IP and after the attack you realize it was spam you can run a quick query and delete delivered email.
Statistical advantages - admins can do email queries across all domains and all usersand all folders to gather useful information. An example might be to generate white list information that can be used for trusted email sources to bypass spam filtering and reduce false positives.
Search Speed - should be a lot faster to search messages than maildir.
And - there's probably a lot of things that will be realized once it's implemented because the indexing ability opens new posibilities.
So - I definitely am someone who supports this idea. It will also make Dovecot stand out as a superior IMAP server.
Go for it!
I second the yes yes yes! Many of my clients have been asking for a Maximizer or ACT! contact manager replacement that will run under Linux or is web-based and not crazy expensive. One of the core features is that many different kinds of objects like files, notes and email can to be linked to a particular contact. If you pull up a contact, it should show the whole history of your interactions with that client including all email correspondence. Up 'til now, I haven't seen an open source contact manager that has this capability without the user having to jump hoops. Having a robust SQL mail storage system gives developers a key tool to integrate email linking with contacts in a flexible manner.
Thanks for opening up a new door!
Stephen
Timo Sirainen wrote:
I wasted some time yesterday and today implementing a SQL storage plugin. It seems to be working, but:
- Saving new messages is done in a regular INSERT statement, which is bad. PostgreSQL has at least this COPY TO command which could be used instead.
depending on the code's structure, you can wrap bundles of 100 or so inserts in transactions, which will speed things up rather noticeably. also watch for connection handling, opening a connection is really expensive so pooling of some sort is good (maybe look at SQL Relay, it's database agnostic to a fair degree and could form part of a bridge to database independence.)
this is also likely something that would benefit from smart disk layout, e.g. putting transaction logs (/var/lib/pgsql/data/pg_xlog) on a separate spindle (or raid 1 mirror), maybe separating message store from other infrastructure using table spaces, etc. etc. in other words, there are a lot of DBA type issues in getting it to perform well.
so for installs, you'll want a guide for DBAs and some flexibility to manage the database setup.
richard (not a real PostgreSQL DBA but i play one at the day job)
Marc Perkel wrote:
I've been waiting till 1.0 came out to ask for new features like SQL. I don't think that SQL is a waste of time at all. In fact I believe that SQL is the future of email storage. SQL has a lot of advantages that will be tapped once people start using it. Yes - it is probably slower than mbox or maildir but you can make up for that with faster computers. SQL offers.
Well imagine that we have 100 users in company, each of them has 200M mailbox usually filled at least to 150M. That gives us 15G database (I am not counting administrative overhead like indexes etc.).
Lovely. What would such database solution require, for example running on MySQL (usual database for many of the small businesses out there), I cannot truly say now and I am afraid of it. It would be a hard beast. In terms of maintenance and also of resources requirements.
When using mail server with maildir/mailbox we have two components: mail server software alone and disk subsystem. When using database we would get mail server software plus database software and disk subsystem. Thus introducing new component with new unknown behaviors in certain situations.
Martin
On Sat, 2007-03-31 at 19:30 +0200, Martin Hudec wrote: <snip>
Well imagine that we have 100 users in company, each of them has 200M mailbox usually filled at least to 150M. That gives us 15G database (I am not counting administrative overhead like indexes etc.).
Lovely. What would such database solution require, for example running on MySQL (usual database for many of the small businesses out there), I cannot truly say now and I am afraid of it. It would be a hard beast. In terms of maintenance and also of resources requirements.
When using mail server with maildir/mailbox we have two components: mail server software alone and disk subsystem. When using database we would get mail server software plus database software and disk subsystem. Thus introducing new component with new unknown behaviors in certain situations.
Martin
WRT size of the db, what about keeping just the message headers and indices in the db and the body as a file? This is akin to some content management systems where the file info is in a db and the content resides as a file. Don't know what the logistics and performance issues are though. Clearly SQL mail storage is not for everyone.
Stephen
WRT size of the db, what about keeping just the message headers and indices in the db and the body as a file?
Sounds very Outlook/$Exchange that...
-- Juha http://www.geekzone.co.nz/juha
On Mar 31, 2007, at 11:36am, Stephen Lee wrote:
On Sat, 2007-03-31 at 19:30 +0200, Martin Hudec wrote: <snip>
Well imagine that we have 100 users in company, each of them has 200M mailbox usually filled at least to 150M. That gives us 15G
database (I am not counting administrative overhead like indexes etc.).Martin
WRT size of the db, what about keeping just the message headers and indices in the db and the body as a file? This is akin to some content management systems where the file info is in a db and the content resides as a file. Don't know what the logistics and performance
issues are though. Clearly SQL mail storage is not for everyone.Stephen
Or the whole message, but attachments could be removed and be stored
as files. Just as clients have the option to store headers, full
messages, and full messages + attachments.
.tim
Stephen Lee wrote:
On Sat, 2007-03-31 at 19:30 +0200, Martin Hudec wrote: <snip>
Well imagine that we have 100 users in company, each of them has 200M mailbox usually filled at least to 150M. That gives us 15G database (I am not counting administrative overhead like indexes etc.).
Lovely. What would such database solution require, for example running on MySQL (usual database for many of the small businesses out there), I cannot truly say now and I am afraid of it. It would be a hard beast. In terms of maintenance and also of resources requirements.
When using mail server with maildir/mailbox we have two components: mail server software alone and disk subsystem. When using database we would get mail server software plus database software and disk subsystem. Thus introducing new component with new unknown behaviors in certain situations.
Martin
WRT size of the db, what about keeping just the message headers and indices in the db and the body as a file? This is akin to some content management systems where the file info is in a db and the content resides as a file. Don't know what the logistics and performance issues are though. Clearly SQL mail storage is not for everyone.
Stephen
Of course it's not for everyone. mbox wins for quick, small, and simple. MySQL would be for those who need what SQL offers. There's lots of features in Dovecot that I don't use, but I'm glad are in there.
On 3/31/2007 Stephen Lee wrote:
WRT size of the db, what about keeping just the message headers and indices in the db and the body as a file? This is akin to some content management systems where the file info is in a db and the content resides as a file. Don't know what the logistics and performance issues are though. Clearly SQL mail storage is not for everyone.
One of the biggest advantages I can think of is single instance storage.
Break the message into its constituent parts (headers, body, attachments), and now you can forget about the idiots that send a 15MB attachment to everyone in the company - resulting in 523 copies of that attachment in the mail system - with SQL storage (done right) - you only have one copy - 15MB instead of 523 x 15MB.
Timo? Can you elaborate on how thi sis implemented? Is the message broken up into parts? How hard would it be to implement single instance storage if an SQL db is used for the storage?
Thanks for all your efforts!
Charles
Charles Marcus wrote:
On 3/31/2007 Stephen Lee wrote:
WRT size of the db, what about keeping just the message headers and indices in the db and the body as a file? This is akin to some content management systems where the file info is in a db and the content resides as a file. Don't know what the logistics and performance issues are though. Clearly SQL mail storage is not for everyone.
One of the biggest advantages I can think of is single instance storage.
Break the message into its constituent parts (headers, body, attachments), and now you can forget about the idiots that send a 15MB attachment to everyone in the company - resulting in 523 copies of that attachment in the mail system - with SQL storage (done right) - you only have one copy - 15MB instead of 523 x 15MB.
Timo? Can you elaborate on how thi sis implemented? Is the message broken up into parts? How hard would it be to implement single instance storage if an SQL db is used for the storage?
Thanks for all your efforts!
Charles That can be done without SQL as well as long as you're on a *nix system: with hard links. One of the reasons that's not usually done is that most MTAs split up the message in one copy per recipient before handing them to the LDA. At least I'm relatively sure that the splitting happens at MTA level.
--
Jens Knoell Network Administrator *Surefoot, L.C.* 1500 Kearns Blvd. Suite A-100 Park City, UT 84060 Phone: (435) 655 8110 ext. 109 Fax: (435) 649 0663 Web: http://www.surefoot.com
On 1.4.2007, at 2.09, Charles Marcus wrote:
One of the biggest advantages I can think of is single instance
storage.
I was planning on implementing that for dbox also.
Timo? Can you elaborate on how thi sis implemented? Is the message
broken up into parts? How hard would it be to implement single
instance storage if an SQL db is used for the storage?
I guess that could be done also. Currently it stores headers and body
in separate fields but in one row.
If this is implemented I think most of the code is going to be
generic, so if it's implemented to one storage it could somewhat
easily be implemented to all of them. Of course not in any standard
way to mbox/maildir.
hi.
I'm sure this is on hold in your head, but just in case.. Can this be V2.0 minimum, please.
Thanks, Ejay
On Sat, 2007-03-31 at 17:34 +0300, Timo Sirainen wrote:
I wasted some time yesterday and today implementing a SQL storage plugin. It seems to be working, but:
- Saving new messages is done in a regular INSERT statement, which is bad. PostgreSQL has at least this COPY TO command which could be used instead.
- It breaks in stress testing
- It's not that well optimized. Especially it could support caching some commonly requested fields (same as dovecot.index.cache file).
- Currently works only with PostgreSQL.
Nothing is committed to CVS yet. I'm not sure if I should even do that. I'm not sure if those lib-sql API changes were that great. Maybe there's a better way..
If you're interested in trying it, you need the latest CVS HEAD sources and these patches:
http://dovecot.org/patches/sql-storage-changes.diff http://dovecot.org/patches/sql-storage-plugin.diff
The SQL schema is in src/plugins/mail-sql/schema.sql file. It creates a "tss" named user by default, so you might want to modify that. :)
Then set:
mail_location = sql:pgsql:host=localhost dbname=mails user=mailuser password=foo mail_plugins = mail_sql
On Sat, 31 Mar 2007 17:34:23 +0300, Timo Sirainen wrote:
I wasted some time yesterday and today implementing a SQL storage plugin. It seems to be working, but:
- Saving new messages is done in a regular INSERT statement, which is bad. PostgreSQL has at least this COPY TO command which could be used instead.
- It breaks in stress testing
- It's not that well optimized. Especially it could support caching some commonly requested fields (same as dovecot.index.cache file).
- Currently works only with PostgreSQL.
Nothing is committed to CVS yet. I'm not sure if I should even do that. I'm not sure if those lib-sql API changes were that great. Maybe there's a better way..
If you're interested in trying it, you need the latest CVS HEAD sources and these patches:
http://dovecot.org/patches/sql-storage-changes.diff http://dovecot.org/patches/sql-storage-plugin.diff
The SQL schema is in src/plugins/mail-sql/schema.sql file. It creates a "tss" named user by default, so you might want to modify that. :)
Then set:
mail_location =3D sql:pgsql:host=3Dlocalhost dbname=3Dmails user=3Dmailuser password=3Dfoo mail_plugins =3D mail_sql
Hi Timo,
After thoughs about SQL: no time wasted, this is a very interesting project. But instead of storing the mail in the SQL database, which would generate a huuuge database, wouldn't it be more optimised to store only the indexes (and some of the header fields) in SQL and keep the mails on file system? With maybe only the smallest mails in SQL as well... This would sort out a lot of locking problems with the indexes, and give very easy sharing/backup/replication/migration facilities. And it might look a bit like the GMail solution, wich proved to work.
As I am looking for that type of solution, I'll be able to put some time to help on developpement.
Regards,
Thierry
participants (11)
-
Charles Marcus
-
Ejay Hire
-
Jens Knoell
-
Juha Saarinen
-
Marc Perkel
-
Martin Hudec
-
Richard P. Welty
-
Stephen Lee
-
Thierry de Montaudry
-
Timo Sirainen
-
Timothy Martin